From patchwork Thu Jun 8 13:39:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 70785 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 14BE1385609B for ; Thu, 8 Jun 2023 13:40:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14BE1385609B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1686231608; bh=MHCfmvn4Szv8+IEjXPpt/7ovqNgi+ermQvz/N7c1kLg=; h=To:CC:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=L4nhg/ivINExa7xDPmdvBubpRDkQnrik4WIPgHNyROvjZqjZ2JM/eKCHshjs2xepL IWgynQ58EsHsGDufL6znGaEHajmm48tV82iMiUG5bpPeAeU7ro0Jv7G/JGj/sC2JZM b3xKsBwh/D2eoY8IBU77ito1WzbBmgrM6zqknkeM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2048.outbound.protection.outlook.com [40.107.249.48]) by sourceware.org (Postfix) with ESMTPS id 364F73858C2F for ; Thu, 8 Jun 2023 13:39:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 364F73858C2F Received: from DB8PR06CA0023.eurprd06.prod.outlook.com (2603:10a6:10:100::36) by PAXPR08MB6605.eurprd08.prod.outlook.com (2603:10a6:102:153::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.32; Thu, 8 Jun 2023 13:39:36 +0000 Received: from DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:100:cafe::7f) by DB8PR06CA0023.outlook.office365.com (2603:10a6:10:100::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6477.19 via Frontend Transport; Thu, 8 Jun 2023 13:39:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT038.mail.protection.outlook.com (100.127.143.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6477.26 via Frontend Transport; Thu, 8 Jun 2023 13:39:36 +0000 Received: ("Tessian outbound e13c2446394c:v136"); Thu, 08 Jun 2023 13:39:36 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 059ba74d6bbaf5fd X-CR-MTA-TID: 64aa7808 Received: from efaf2c390221.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 4F64EBC3-9998-49F7-81DC-530B17293812.1; Thu, 08 Jun 2023 13:39:30 +0000 Received: from EUR02-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id efaf2c390221.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 08 Jun 2023 13:39:30 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ByM0XFbKvP+h7GzM4gUBMOT7yyhTYNuK2zgTbH14ZgqmLf6+hkYY21/h+Aeu0vTfSbgFbRzvAkyvxKFcLLTo3Q0cTBlEPyGbrKGvJXkOKpAGI/JSRHv0nLUkHeHmJoALLDRqRcCuEj4li0hnb4gnJZRDwgxa2rCg/6Vq//ZpfYd3jG+KyIoxr+8lvwAp2hDSmKKThvA8yWQ9dQ9B3oj1GE8xXP+nF/diB3aYfavjHAEykIRAC6T4QIIZNcgSgpVbZ4UD6tCAcNWwd2X/B7kZPCzjst1W+H0K8rV2ZgWgg+bacQ1ixp+H427/h0vPzumKzpVFfVXm312cwrg7wA/Nwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MHCfmvn4Szv8+IEjXPpt/7ovqNgi+ermQvz/N7c1kLg=; b=jj/eg5rcZ3S7oZHR4Pd9rBaZn2pq9ZfAbLAwVjlT4Hq7C1Oj5TtnlTic1Yg7cgmUUMkHvK3CnYFM6N0+03fpRL/YRTrsTciziQnuWmhijkhCXLl7ZdiAkors0RdXIY7TVfnuZpCumMUGhMu4waTW7NY5ZMF357jdWtSi6dquejJXNE3fPG/3JmSbsJGlwesEkxdc96y4+X4l9Bo5YNu2xkyeSqe7QtN4zgSJYR+rNmYy8uOxm1MDtf26GvbN+X5pMds3QOcJIT6arWI4DKV4/X9oKnoKsVj5AkU0jmoZ1/3nk3mLUslQiXCMncEny5KQqoFAC1/9FCsG/UqG6M/x9Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none Received: from DB6PR1001CA0017.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:4:b7::27) by AS2PR08MB9221.eurprd08.prod.outlook.com (2603:10a6:20b:59f::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.33; Thu, 8 Jun 2023 13:39:25 +0000 Received: from DBAEUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:b7:cafe::a) by DB6PR1001CA0017.outlook.office365.com (2603:10a6:4:b7::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.33 via Frontend Transport; Thu, 8 Jun 2023 13:39:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DBAEUR03FT029.mail.protection.outlook.com (100.127.142.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6477.26 via Frontend Transport; Thu, 8 Jun 2023 13:39:25 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 8 Jun 2023 13:39:25 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.23 via Frontend Transport; Thu, 8 Jun 2023 13:39:25 +0000 To: CC: Joe Ramsay Subject: [PATCH 2/4] aarch64: Add vector implementations of sin routines Date: Thu, 8 Jun 2023 14:39:21 +0100 Message-ID: <20230608133923.31680-2-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230608133923.31680-1-Joe.Ramsay@arm.com> References: <20230608133923.31680-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DBAEUR03FT029:EE_|AS2PR08MB9221:EE_|DBAEUR03FT038:EE_|PAXPR08MB6605:EE_ X-MS-Office365-Filtering-Correlation-Id: 0f8d63ad-f88c-43b2-e915-08db6825cd1b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 7vDLaUAFhYOwY7idRbLjm7ljqLKXEUUpaE9HydjIU9EtXk7ge3RcfoaQN27m4iNMLN7A/4ddhl/933Bafcv/i4GZNdfQFX7kPEwg825S3geyVd533TH3jXXHnGu1aUsQzjpFuWDgbmnIKFXKgRmssn3xPdIVjKT9p5s4NiRJR1L4GobP0kHNCyIUfQ/AaYVTVvOUuUOUWr+cCzrTDNKkPh85DcOqYxDAzz4e1Ap/fCtrXBDVLEm6uGhbdS2VHj3tZR+WkUTUhCxu+CTH+BgZEYJbX56A6HOjqllYlWGYD5dFNDDgysm1/uPBghD+Km3bp1Rzo6+DJ49hGq9re8yMvZFyZt9NxcmZhmEHXNKoFV/Xk3okkRKK4rkm5nA68kLSIcJo9IzUYK+LYigyZ11+NmLQxReOLa42EzWa6npLW6yImDRhFyxXwzJVNq3ycx31nyn/JwGYmxCYFjen9b2sfWbSlIg9VwvDB6HG9d03qZJj0FO6BPm7DrhFhqP7RN9kEpi0GtBZXy2zAN0a8MrlI8HEM9BUE9k032uiZAEsZxudcAGDzmpun1Ny035extJpMiW/vkBgqKSRbOZo95sHz3J7XdWsb9RVti717Je8ePAKSJoyDN3AZIHp/quYhvnK0OagJvEsy6OyybmwZwSXYVrZv6G024EJhqUD8zEUVYS4irZgPOAoEmyXmFPMKGMul5vtxFPJ47uc1W78li3WO4pd/UNSoabqClUyhGZZHvFcb/RRvcdgLr0G7SUTCWfN9QsAY/JJudHz2yTavRGnjg== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230028)(4636009)(376002)(346002)(396003)(136003)(39860400002)(451199021)(46966006)(36840700001)(40470700004)(8676002)(2906002)(316002)(5660300002)(30864003)(70206006)(6916009)(4326008)(8936002)(70586007)(41300700001)(7696005)(6666004)(426003)(336012)(26005)(1076003)(36756003)(186003)(2616005)(83380400001)(47076005)(36860700001)(82310400005)(478600001)(356005)(82740400003)(40460700003)(81166007)(86362001)(40480700001)(2004002)(36900700001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9221 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e56a3d7b-2a43-4991-e4c9-08db6825c689 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XjsZlTwtJ916uGP7EZgjd0wFpRyNG1vfY5LPeGM116yNDS6bU2yQfEcjNpIspTJM0fzbJmgiTA0afsPJyeh9aT/Mgx3fCmNGDQBmtWInynLjMRfDnPdJddO+h9UzfU0Kvtoy5HV+6mhAFw7QZNo9q5Yj5dG2wWlQjJXF8SWWy8wr0D8STU6Rr1LVljp6FAeydEa7xtvV6nR1+CCq1N70jSM4yydbo/iDZSGvqU5f3zHlveh4zowp0Cd9yH9dkNoqkXI0WC0yirtq6h2duZE6gVO0t84d9IlXY4pg0L7AEPADe+AiFtrG85g+CJakVjvq0BIGhthSLHX+3VCM987zFJc/5vebkCpk+JqONN7TrbEkSvQqHjFSyU/fgMt9Pre2SRIooHvUPEpwPV1gKunT0Rqt33G2Xhm7GBjrUU6erfjUPKxjGykkL16PFDjuDymzZEfA/naaFny0EfbgdYm29t0d9rb+tLWkelzZ52pZAGgnEYCEDhiicGkg6kA6mLULlQjgG+37UoU8huoDN1D2CpEz5FIsIj/0YyMBLSp5gl2WG6ULrSfIQCBAgirRJHU8qXLr+e5P6HQonDpfaRgfFaoQEnKvV7H+6d31sic0KPfDJraRt28Q2tCeawEuOaBil+tHuC0NxtCUS135OV5qbJngzlA/sllLRAn8XGVyehmQ+BGUEQ/WQs4U64FYSBqBBYFCrM4ytguf9D8ACaUGlgwX2Hcnvds2Sfn3fczuNXXJEJoCT/7YiRJBZnBLkEXf X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(39860400002)(451199021)(36840700001)(40470700004)(46966006)(40460700003)(478600001)(40480700001)(5660300002)(8936002)(8676002)(4326008)(2906002)(36756003)(86362001)(30864003)(81166007)(70206006)(82740400003)(6916009)(316002)(82310400005)(70586007)(41300700001)(36860700001)(2616005)(1076003)(26005)(186003)(47076005)(83380400001)(6666004)(426003)(7696005)(336012)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jun 2023 13:39:36.7294 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0f8d63ad-f88c-43b2-e915-08db6825cd1b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6605 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Joe Ramsay via Libc-alpha From: Joe Ramsay Reply-To: Joe Ramsay Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. Also allow certain tests to be skipped for mathvec routines, for example both AdvSIMD algorithms discard the sign of 0. --- math/auto-libm-test-out-sin | 4 +- math/gen-libm-test.py | 3 +- sysdeps/aarch64/fpu/Makefile | 8 +- sysdeps/aarch64/fpu/Versions | 4 + sysdeps/aarch64/fpu/bits/math-vector.h | 6 ++ sysdeps/aarch64/fpu/sin_advsimd.c | 100 ++++++++++++++++++ sysdeps/aarch64/fpu/sin_sve.c | 96 +++++++++++++++++ sysdeps/aarch64/fpu/sinf_advsimd.c | 93 ++++++++++++++++ sysdeps/aarch64/fpu/sinf_sve.c | 92 ++++++++++++++++ sysdeps/aarch64/fpu/sv_horner_wrap.h | 55 ++++++++++ sysdeps/aarch64/fpu/sv_hornerf.h | 24 +++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 4 + 17 files changed, 494 insertions(+), 7 deletions(-) create mode 100644 sysdeps/aarch64/fpu/sin_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sin_sve.c create mode 100644 sysdeps/aarch64/fpu/sinf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sinf_sve.c create mode 100644 sysdeps/aarch64/fpu/sv_horner_wrap.h create mode 100644 sysdeps/aarch64/fpu/sv_hornerf.h diff --git a/math/auto-libm-test-out-sin b/math/auto-libm-test-out-sin index f1d21b179c..27ccaff1aa 100644 --- a/math/auto-libm-test-out-sin +++ b/math/auto-libm-test-out-sin @@ -25,11 +25,11 @@ sin 0 = sin upward ibm128 0x0p+0 : 0x0p+0 : inexact-ok sin -0 = sin downward binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= sin tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok += sin tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok no-mathvec = sin towardzero binary32 -0x0p+0 : -0x0p+0 : inexact-ok = sin upward binary32 -0x0p+0 : -0x0p+0 : inexact-ok = sin downward binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= sin tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok += sin tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok no-mathvec = sin towardzero binary64 -0x0p+0 : -0x0p+0 : inexact-ok = sin upward binary64 -0x0p+0 : -0x0p+0 : inexact-ok = sin downward intel96 -0x0p+0 : -0x0p+0 : inexact-ok diff --git a/math/gen-libm-test.py b/math/gen-libm-test.py index 6ae78beb01..a573c3b8cb 100755 --- a/math/gen-libm-test.py +++ b/math/gen-libm-test.py @@ -93,7 +93,8 @@ BEAUTIFY_MAP = {'minus_zero': '-0', # Flags in auto-libm-test-out that map directly to C flags. FLAGS_SIMPLE = {'ignore-zero-inf-sign': 'IGNORE_ZERO_INF_SIGN', - 'xfail': 'XFAIL_TEST'} + 'xfail': 'XFAIL_TEST', + 'no-mathvec': 'NO_TEST_MATHVEC'} # Exceptions in auto-libm-test-out, and their corresponding C flags # for being required, OK or required to be absent. diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 850cfb9012..b3285542ea 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -1,10 +1,10 @@ -float-advsimd-funcs = cos +float-advsimd-funcs = cos sin -double-advsimd-funcs = cos +double-advsimd-funcs = cos sin -float-sve-funcs = cos +float-sve-funcs = cos sin -double-sve-funcs = cos +double-sve-funcs = cos sin ifeq ($(subdir),mathvec) libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 5222a6f180..d26b3968a9 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -1,8 +1,12 @@ libmvec { GLIBC_2.38 { _ZGVnN2v_cos; + _ZGVnN2v_sin; _ZGVnN4v_cosf; + _ZGVnN4v_sinf; _ZGVsMxv_cos; _ZGVsMxv_cosf; + _ZGVsMxv_sin; + _ZGVsMxv_sinf; } } diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index a2f2277591..ad9c9945e8 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -50,7 +50,10 @@ typedef __SVBool_t __sv_bool_t; # define __vpcs __attribute__ ((__aarch64_vector_pcs__)) __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); + __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); # undef __ADVSIMD_VEC_MATH_SUPPORTED #endif /* __ADVSIMD_VEC_MATH_SUPPORTED */ @@ -58,7 +61,10 @@ __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); #ifdef __SVE_VEC_MATH_SUPPORTED __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); + __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); # undef __SVE_VEC_MATH_SUPPORTED #endif /* __SVE_VEC_MATH_SUPPORTED */ diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c new file mode 100644 index 0000000000..1206a5d760 --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -0,0 +1,100 @@ +/* Double-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float64x2_t poly[7]; + float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ + .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), + V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), + V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), + V2 (-0x1.9e9540300a1p-41) }, + + .range_val = V2 (0x1p23), + .inv_pi = V2 (0x1.45f306dc9c883p-2), + .pi_1 = V2 (0x1.921fb54442d18p+1), + .pi_2 = V2 (0x1.1a62633145c06p-53), + .pi_3 = V2 (0x1.c1cd129024e09p-106), + .shift = V2 (0x1.8p52), +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */ +# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) +{ + y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); + return v_call_f64 (sin, x, y, cmp); +} + +float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) +{ + float64x2_t n, r, r2, r3, r4, y, t1, t2, t3; + uint64x2_t odd, cmp; + +#if WANT_SIMD_EXCEPT + /* Detect |x| <= TinyBound or |x| >= RangeVal. If fenv exceptions are to be + triggered correctly, set any special lanes to 1 (which is neutral w.r.t. + fenv). These lanes will be fixed by special-case handler later. */ + uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x)); + cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh); + r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x); +#else + r = x; + cmp = vcageq_f64 (data.range_val, x); + cmp = vceqzq_u64 (cmp); /* cmp = ~cmp. */ +#endif + + /* n = rint(|x|/pi). */ + n = vfmaq_f64 (data.shift, data.inv_pi, r); + odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63); + n = vsubq_f64 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + r = vfmsq_f64 (r, data.pi_1, n); + r = vfmsq_f64 (r, data.pi_2, n); + r = vfmsq_f64 (r, data.pi_3, n); + + /* sin(r) poly approx. */ + r2 = vmulq_f64 (r, r); + r3 = vmulq_f64 (r2, r); + r4 = vmulq_f64 (r2, r2); + + t1 = vfmaq_f64 (C (4), C (5), r2); + t2 = vfmaq_f64 (C (2), C (3), r2); + t3 = vfmaq_f64 (C (0), C (1), r2); + + y = vfmaq_f64 (t1, C (6), r4); + y = vfmaq_f64 (t2, y, r4); + y = vfmaq_f64 (t3, y, r4); + y = vfmaq_f64 (r, y, r3); + + if (unlikely (v_any_u64 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c new file mode 100644 index 0000000000..3750700759 --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_sve.c @@ -0,0 +1,96 @@ +/* Double-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static struct +{ + double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, + shift; +} data = { + /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ + .inv_pi = 0x1.45f306dc9c883p-2, + .half_pi = 0x1.921fb54442d18p+0, + .inv_pi_over_2 = 0x1.45f306dc9c882p-1, + .pi_over_2_1 = 0x1.921fb50000000p+0, + .pi_over_2_2 = 0x1.110b460000000p-26, + .pi_over_2_3 = 0x1.1a62633145c07p-54, + .shift = 0x1.8p52 +}; + +#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) +{ + return sv_call_f64 (sin, x, y, cmp); +} + +/* A fast SVE implementation of sin based on trigonometric + instructions (FTMAD, FTSSEL, FTSMUL). + Maximum observed error in 2.52 ULP: + SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 + want 0x1.10ace8f3e7868p-40. */ +svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) +{ + svfloat64_t r = svabs_f64_x (pg, x); + svuint64_t sign + = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); + svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + + /* Load first two pio2-related constants to one vector. */ + svfloat64_t invpio2_and_pio2_1 + = svld1rq_f64 (svptrue_b64 (), &data.inv_pi_over_2); + + /* n = rint(|x|/(pi/2)). */ + svfloat64_t q + = svmla_lane_f64 (sv_f64 (data.shift), r, invpio2_and_pio2_1, 0); + svfloat64_t n = svsub_n_f64_x (pg, q, data.shift); + + /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ + r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); + r = svmls_n_f64_x (pg, r, n, data.pi_over_2_2); + r = svmls_n_f64_x (pg, r, n, data.pi_over_2_3); + + /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ + svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); + + /* sin(r) poly approx. */ + svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t y = sv_f64 (0.0); + y = svtmad_f64 (y, r2, 7); + y = svtmad_f64 (y, r2, 6); + y = svtmad_f64 (y, r2, 5); + y = svtmad_f64 (y, r2, 4); + y = svtmad_f64 (y, r2, 3); + y = svtmad_f64 (y, r2, 2); + y = svtmad_f64 (y, r2, 1); + y = svtmad_f64 (y, r2, 0); + + /* Apply factor. */ + y = svmul_f64_x (pg, f, y); + + /* sign = y^sign. */ + y = svreinterpret_f64_u64 ( + sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); + + if (unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/sinf_advsimd.c b/sysdeps/aarch64/fpu/sinf_advsimd.c new file mode 100644 index 0000000000..6267594000 --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_advsimd.c @@ -0,0 +1,93 @@ +/* Single-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float32x4_t poly[4]; + float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* 1.886 ulp error. */ + .poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f), + V4 (0x1.5b2e76p-19f) }, + + .pi_1 = V4 (0x1.921fb6p+1f), + .pi_2 = V4 (-0x1.777a5cp-24f), + .pi_3 = V4 (-0x1.ee59dap-49f), + + .inv_pi = V4 (0x1.45f306p-2f), + .shift = V4 (0x1.8p+23f), + .range_val = V4 (0x1p20f) +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */ +# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp) +{ + /* Fall back to scalar code. */ + y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); + return v_call_f32 (sinf, x, y, cmp); +} + +float32x4_t VPCS_ATTR V_NAME_F1 (sin) (float32x4_t x) +{ + float32x4_t n, r, r2, y; + uint32x4_t odd, cmp; + +#if WANT_SIMD_EXCEPT + uint32x4_t ir = vreinterpretq_u32_f32 (vabsq_f32 (x)); + cmp = vcgeq_u32 (vsubq_u32 (ir, TinyBound), Thresh); + /* If fenv exceptions are to be triggered correctly, set any special lanes + to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by + special-case handler later. */ + r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x); +#else + r = x; + cmp = vcageq_f32 (data.range_val, x); + cmp = vceqzq_u32 (cmp); /* cmp = ~cmp. */ +#endif + + /* n = rint(|x|/pi) */ + n = vfmaq_f32 (data.shift, data.inv_pi, r); + odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31); + n = vsubq_f32 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */ + r = vfmsq_f32 (r, data.pi_1, n); + r = vfmsq_f32 (r, data.pi_2, n); + r = vfmsq_f32 (r, data.pi_3, n); + + /* y = sin(r) */ + r2 = vmulq_f32 (r, r); + y = vfmaq_f32 (C (2), C (3), r2); + y = vfmaq_f32 (C (1), y, r2); + y = vfmaq_f32 (C (0), y, r2); + y = vfmaq_f32 (r, vmulq_f32 (y, r2), r); + + if (unlikely (v_any_u32 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c new file mode 100644 index 0000000000..4159d90534 --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_sve.c @@ -0,0 +1,92 @@ +/* Single-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "sv_hornerf.h" + +static struct +{ + float poly[4]; + /* Pi-related values to be loaded as one quad-word and used with + svmla_lane_f32. */ + float negpi1, negpi2, negpi3, invpi; + float shift; +} data = { + .poly = { + /* Non-zero coefficients from the degree 9 Taylor series expansion of + sin. */ + -0x1.555548p-3f, 0x1.110df4p-7f, -0x1.9f42eap-13f, 0x1.5b2e76p-19f + }, + .negpi1 = -0x1.921fb6p+1f, + .negpi2 = 0x1.777a5cp-24f, + .negpi3 = 0x1.ee59dap-49f, + .invpi = 0x1.45f306p-2f, + .shift = 0x1.8p+23f +}; + +#define RangeVal 0x49800000 /* asuint32 (0x1p20f). */ +#define C(i) data.poly[i] + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t cmp) +{ + return sv_call_f32 (sinf, x, y, cmp); +} + +/* A fast SVE implementation of sinf. + Maximum error: 1.89 ULPs. + This maximum error is achieved at multiple values in [-2^18, 2^18] + but one example is: + SV_NAME_F1 (sin)(0x1.9247a4p+0) got 0x1.fffff6p-1 want 0x1.fffffap-1. */ +svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) +{ + svfloat32_t ax = svabs_f32_x (pg, x); + svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), + svreinterpret_u32_f32 (ax)); + svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); + + /* pi_vals are a quad-word of helper values - the first 3 elements contain + -pi in extended precision, the last contains 1 / pi. */ + svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &data.negpi1); + + /* n = rint(|x|/pi). */ + svfloat32_t n = svmla_lane_f32 (sv_f32 (data.shift), ax, pi_vals, 3); + svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); + n = svsub_n_f32_x (pg, n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + svfloat32_t r; + r = svmla_lane_f32 (ax, n, pi_vals, 0); + r = svmla_lane_f32 (r, n, pi_vals, 1); + r = svmla_lane_f32 (r, n, pi_vals, 2); + + /* sin(r) approx using a degree 9 polynomial from the Taylor series + expansion. Note that only the odd terms of this are non-zero. */ + svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t y = HORNER_3 (pg, r2, C); + y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); + + /* sign = y^sign^odd. */ + y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), + sveor_u32_x (pg, sign, odd))); + + if (unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/sv_horner_wrap.h b/sysdeps/aarch64/fpu/sv_horner_wrap.h new file mode 100644 index 0000000000..142a06d5c4 --- /dev/null +++ b/sysdeps/aarch64/fpu/sv_horner_wrap.h @@ -0,0 +1,55 @@ +/* Helper macros for Horner polynomial evaluation in SVE routines. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define HORNER_1_(pg, x, c, i) FMA (pg, VECTOR (c (i + 1)), x, VECTOR (c (i))) +#define HORNER_2_(pg, x, c, i) \ + FMA (pg, HORNER_1_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_3_(pg, x, c, i) \ + FMA (pg, HORNER_2_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_4_(pg, x, c, i) \ + FMA (pg, HORNER_3_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_5_(pg, x, c, i) \ + FMA (pg, HORNER_4_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_6_(pg, x, c, i) \ + FMA (pg, HORNER_5_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_7_(pg, x, c, i) \ + FMA (pg, HORNER_6_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_8_(pg, x, c, i) \ + FMA (pg, HORNER_7_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_9_(pg, x, c, i) \ + FMA (pg, HORNER_8_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_10_(pg, x, c, i) \ + FMA (pg, HORNER_9_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_11_(pg, x, c, i) \ + FMA (pg, HORNER_10_ (pg, x, c, i + 1), x, VECTOR (c (i))) +#define HORNER_12_(pg, x, c, i) \ + FMA (pg, HORNER_11_ (pg, x, c, i + 1), x, VECTOR (c (i))) + +#define HORNER_1(pg, x, c) HORNER_1_ (pg, x, c, 0) +#define HORNER_2(pg, x, c) HORNER_2_ (pg, x, c, 0) +#define HORNER_3(pg, x, c) HORNER_3_ (pg, x, c, 0) +#define HORNER_4(pg, x, c) HORNER_4_ (pg, x, c, 0) +#define HORNER_5(pg, x, c) HORNER_5_ (pg, x, c, 0) +#define HORNER_6(pg, x, c) HORNER_6_ (pg, x, c, 0) +#define HORNER_7(pg, x, c) HORNER_7_ (pg, x, c, 0) +#define HORNER_8(pg, x, c) HORNER_8_ (pg, x, c, 0) +#define HORNER_9(pg, x, c) HORNER_9_ (pg, x, c, 0) +#define HORNER_10(pg, x, c) HORNER_10_ (pg, x, c, 0) +#define HORNER_11(pg, x, c) HORNER_11_ (pg, x, c, 0) +#define HORNER_12(pg, x, c) HORNER_12_ (pg, x, c, 0) diff --git a/sysdeps/aarch64/fpu/sv_hornerf.h b/sysdeps/aarch64/fpu/sv_hornerf.h new file mode 100644 index 0000000000..146c117019 --- /dev/null +++ b/sysdeps/aarch64/fpu/sv_hornerf.h @@ -0,0 +1,24 @@ +/* Helper macros for single-precision Horner polynomial evaluation + in SVE routines. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define FMA(pg, x, y, z) svmla_f32_x (pg, z, x, y) +#define VECTOR sv_f32 + +#include "sv_horner_wrap.h" diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index cb45fd3298..4af97a25a2 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float64x2_t VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) +VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index cf72ef83b7..64c790adc5 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) +SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index fa146862b0..50e776b952 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float32x4_t VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) +VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index bc26558c62..7355032929 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) +SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 07da4ab843..4145662b2d 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1257,11 +1257,19 @@ double: 1 float: 1 ldouble: 2 +Function: "sin_advsimd": +double: 2 +float: 1 + Function: "sin_downward": double: 1 float: 1 ldouble: 3 +Function: "sin_sve": +double: 2 +float: 1 + Function: "sin_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 13af421af2..a4c564859c 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -1,4 +1,8 @@ GLIBC_2.38 _ZGVnN2v_cos F +GLIBC_2.38 _ZGVnN2v_sin F GLIBC_2.38 _ZGVnN4v_cosf F +GLIBC_2.38 _ZGVnN4v_sinf F GLIBC_2.38 _ZGVsMxv_cos F GLIBC_2.38 _ZGVsMxv_cosf F +GLIBC_2.38 _ZGVsMxv_sin F +GLIBC_2.38 _ZGVsMxv_sinf F