From patchwork Wed Oct 4 09:37:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 77085 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C5D60385770D for ; Wed, 4 Oct 2023 09:38:24 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2041.outbound.protection.outlook.com [40.107.22.41]) by sourceware.org (Postfix) with ESMTPS id 86F403858CDB for ; Wed, 4 Oct 2023 09:38:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86F403858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7Alh4hhffYJOOne0MEzrXcsjdccFSUJXrJqFkQkGFjk=; b=hZ3e/fDXa1oQo5y5X94Adn71eK3JkNB3QSncH0s96I3fPZld48VSLx92QsJ6p5Dr/xnjoJeRGl2JtzLTe3g2C5QHj/1y0/rBw8QoHTi0DX5LBMTCm94Yqih5YMvppgvZnWTZr03yKKyI4YPoMhkxMIRDbGdaGZUd4FDy/e44RK0= Received: from AS9PR05CA0231.eurprd05.prod.outlook.com (2603:10a6:20b:494::22) by DU2PR08MB10130.eurprd08.prod.outlook.com (2603:10a6:10:493::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.28; Wed, 4 Oct 2023 09:38:04 +0000 Received: from AM7EUR03FT027.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:494:cafe::89) by AS9PR05CA0231.outlook.office365.com (2603:10a6:20b:494::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Wed, 4 Oct 2023 09:38:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT027.mail.protection.outlook.com (100.127.140.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.26 via Frontend Transport; Wed, 4 Oct 2023 09:38:04 +0000 Received: ("Tessian outbound fdf44c93bd44:v211"); Wed, 04 Oct 2023 09:38:04 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3323d3ecfc7e70f5 X-CR-MTA-TID: 64aa7808 Received: from 78df287c6b6c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BC64B783-2B46-4944-95E9-A3F330E26087.1; Wed, 04 Oct 2023 09:37:57 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 78df287c6b6c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 04 Oct 2023 09:37:57 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GFLpTst6m5B9eNln6JWhqsryabHmyorNq4mTYBVx4TdnDzqM2esO2lnI0ZJc4Iv5MgJPhZf0IMazVh0sMK42ZdDJuFH0akmCKFT0dwE2fGn8h3BwdG4oDT17AhH9fvi/8xPMxfXkun1oFTlixopvFyCo2H7zCoVS8ABGMRy4ufH7/9e95mHWc1QKwGud/HfTWD3h2G7QiSMaO1f76ZE6Yu2kHXylUB7c15cWctQmo6hVOzAGT3BZfwkw9TeJAZFsWbnIZAq8cBSqw1MkBp8GPmNIoTR9bAUiLCQ03AfZR0ctvOXhLVFOXrlKJzH1Pqgr6xvjzFs49DG0hYKgot2cWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7Alh4hhffYJOOne0MEzrXcsjdccFSUJXrJqFkQkGFjk=; b=OhTwutDk00F4rkIklLbD87LKplnVgqmlk1WPe1zBJJJBAVCBrNZstpCslXqiwrJeDYuJ9L4tm4eRFPiP78KcJ1OoeFwNV+dg8bM0rIxzYxVxiKepr8J5chKohUbDiuWyR8YMoVZFemJ7L9PmX9tXoYVrupMWwc8CS/eOmbDYoKoANqLm3uPS995FyeHeCUIFMTFRBatpHqZ77n1ac067TMUCF4NbFak1I2GPqLNdOwjZzNVChuTZM9roH5JBKDTT185LEkkMfAEt2R57Ldu2SNeDT7lmNZieHoCpIe/Np4IFk4b4nHinbQqPVlFU+C/73hAFGMPFuO5lnUuKmSjm7w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7Alh4hhffYJOOne0MEzrXcsjdccFSUJXrJqFkQkGFjk=; b=hZ3e/fDXa1oQo5y5X94Adn71eK3JkNB3QSncH0s96I3fPZld48VSLx92QsJ6p5Dr/xnjoJeRGl2JtzLTe3g2C5QHj/1y0/rBw8QoHTi0DX5LBMTCm94Yqih5YMvppgvZnWTZr03yKKyI4YPoMhkxMIRDbGdaGZUd4FDy/e44RK0= Received: from DB8PR06CA0056.eurprd06.prod.outlook.com (2603:10a6:10:120::30) by PAWPR08MB9805.eurprd08.prod.outlook.com (2603:10a6:102:2e6::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33; Wed, 4 Oct 2023 09:37:52 +0000 Received: from DBAEUR03FT046.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:120:cafe::6b) by DB8PR06CA0056.outlook.office365.com (2603:10a6:10:120::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.34 via Frontend Transport; Wed, 4 Oct 2023 09:37:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DBAEUR03FT046.mail.protection.outlook.com (100.127.142.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6863.26 via Frontend Transport; Wed, 4 Oct 2023 09:37:52 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 4 Oct 2023 09:37:51 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 4 Oct 2023 09:37:51 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.27 via Frontend Transport; Wed, 4 Oct 2023 09:37:51 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH] aarch64: Optimize SVE cos & cosf Date: Wed, 4 Oct 2023 10:37:50 +0100 Message-ID: <20231004093750.48645-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DBAEUR03FT046:EE_|PAWPR08MB9805:EE_|AM7EUR03FT027:EE_|DU2PR08MB10130:EE_ X-MS-Office365-Filtering-Correlation-Id: dd903ec4-1ee0-4c91-da8a-08dbc4bd9bcf x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0vuKCFu8XKCNyVJs9Et4cPbVYn60go4VN9STRljMIdHfo+i319NJxSm7q/YKSIHftfT2jeIKsw8l5WjW446mp2Yh3e/QwFp/iRPx82nFuaQkZyWd6Tfwbu2F9lbl2v/WZ/GjLV+vVA90j4vtyx3D5XkLt8R2bKkyur6496YYK2+MzmD92AZKp4whTIt2vCMsO7zOkB/WYes15rdB7WSLLZVVjOoUBmsNacPCDVl1JIPBUE2TZ06jYGd9H64GSG8UfjO7TvmR+LaUP+ubdbpp+pPZhw6FimF4uaMz5uiDUrqdgEuQSw2aJGMiee/tkPcWouzR/mjqW6ndi3VrtutwKtTCZEcWOBXKfvyVOJR8f2zvpMH9WGjb+1DWrqfK5mhJ36UjsUakPORM8+usqtDJATBWNswShpAUCc+SGKhAHHO3eH/WU2i5K62+giyFv1HJoRQa5KqcXAMkNh7It6mtWbs0ZYBKB1ysWdik6pUebv1pIIl45hnEliXHRyNcP7IzAgnKsP2DC5JrxHsfZgoo7VdkGs3B+q6lxRi7PYYDED208dJaaNuNq3ZzR2EwVofNo4kNuptwuPI4XSlRsPTymQJJY3Hkx2VVRJX7tvNu5RmhSVqxsApvF4EiYdztGOoTHhPb096lzcyWPS7KkvUKZH323vmgPkZMDflEzaTgw/ZFphnuT/NZK65c42Ca+aSImxJ0W/8DQ/o6RuBgX+f3rHxvVG6Gyhu30+kWm7o7P+k/Pl3tUId0qLYVm207hgCMCo09R8uoEhzJ+4mHTFpHGq+OaIIWt1HIRov4C6XPCYA= X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(4636009)(376002)(346002)(136003)(39860400002)(396003)(230922051799003)(82310400011)(451199024)(1800799009)(186009)(64100799003)(40470700004)(46966006)(36840700001)(41300700001)(1076003)(2616005)(316002)(7696005)(70586007)(6916009)(70206006)(8676002)(8936002)(4326008)(5660300002)(26005)(478600001)(336012)(426003)(40460700003)(82740400003)(81166007)(2906002)(83380400001)(47076005)(40480700001)(36756003)(356005)(36860700001)(86362001)(36900700001)(473944003); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9805 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT027.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: b72d129a-612f-4996-e89f-08dbc4bd9480 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aybO7KVUnq75aZeb4AOun6auCHzGjRSJazVIuOmh3zS1+ciDbIZlS/TTi45DdZr1zAjn/6N0lQvIRI9tq5UKBaSQzEKwJvnSYBjjCz2M7MUvYWjyD+KU+Anb1tLZ59r/hcr/TuaM61ZlKVTvPthGPfs//JBihj4lgCx4SHVu9iEC9tB3GTWslv/1x2VQTlO+V9NBdJmftubDZDiOUoDFLnLqUJt1tJBMGshPcTj8yOFUMB5k+yL/l4/1wPENrSRoJiVF8IgfI72gZFU3xpvir5PoTmNrqPV2GoRN9MQHx8xFDI3hfKAkXayOENgQ0qxAnizTnizHdUD1YkpC3Zpv3Z3mt7fWM5k53NG9WrobIcizgHMIIcLbk/q+84K1gwq0b/TGNTrjY0xGp/n+LMjEbpvn1V/1k/5b82R6D7OyfUNxNYiOSELfeRVqZ9JYVaOz4DUV43r4N+l5FgaZOjI3Y1ykrkHGPmyJEGo9IxZNhLBAA2lXJVhLNGqgDZfdZBM2504bfg5XXBjYqB8mPbMJFzA7/k+docL8KrT4+enbdedU7b9M0oh2k9eJNEQowTEO6JTkx5r1hd1R/TuvASGU+q8aLWtDnGMf+A4ajTuE5FWDsGqXwQ3F+xhDmNs3L/iqS0M2tsecA6ChXq/CqS3sW52Mu4AJZ3/0lFrjHAKznoDV0pUNQxbPMFtskjD1p0BWnE6sIT8w5N8Tqithxky0JAM5WcXCEsRHCnHNXo8p8CVc2Qv+NmCxSP/BRPHfW24amCJqeUq4pDQaoTZHbqBKqw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(376002)(136003)(39850400004)(396003)(230922051799003)(451199024)(1800799009)(82310400011)(186009)(64100799003)(46966006)(36840700001)(40470700004)(70206006)(26005)(2906002)(70586007)(4326008)(5660300002)(8676002)(6916009)(41300700001)(8936002)(478600001)(7696005)(36860700001)(336012)(316002)(426003)(1076003)(2616005)(47076005)(83380400001)(40460700003)(81166007)(82740400003)(40480700001)(36756003)(86362001)(473944003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Oct 2023 09:38:04.3916 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: dd903ec4-1ee0-4c91-da8a-08dbc4bd9bcf X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT027.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR08MB10130 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Saves a mov by ensuring return value does not need to be moved out of the way before special-case branch. Also change to use overloaded intrinsics. --- Thanks, Joe sysdeps/aarch64/fpu/cos_sve.c | 53 +++++++++++++++++----------------- sysdeps/aarch64/fpu/cosf_sve.c | 47 ++++++++++++++---------------- 2 files changed, 47 insertions(+), 53 deletions(-) diff --git a/sysdeps/aarch64/fpu/cos_sve.c b/sysdeps/aarch64/fpu/cos_sve.c index d7a9b134da..c804e52fc6 100644 --- a/sysdeps/aarch64/fpu/cos_sve.c +++ b/sysdeps/aarch64/fpu/cos_sve.c @@ -37,9 +37,9 @@ static const struct data #define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ static svfloat64_t NOINLINE -special_case (svfloat64_t x, svfloat64_t y, svbool_t out_of_bounds) +special_case (svfloat64_t x, svfloat64_t y, svbool_t oob) { - return sv_call_f64 (cos, x, y, out_of_bounds); + return sv_call_f64 (cos, x, y, oob); } /* A fast SVE implementation of cos based on trigonometric @@ -51,42 +51,41 @@ svfloat64_t SV_NAME_D1 (cos) (svfloat64_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat64_t r = svabs_f64_x (pg, x); - svbool_t out_of_bounds - = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + svfloat64_t r = svabs_x (pg, x); + svbool_t oob = svcmpge (pg, svreinterpret_u64 (r), RangeVal); /* Load some constants in quad-word chunks to minimise memory access. */ svbool_t ptrue = svptrue_b64 (); - svfloat64_t invpio2_and_pio2_1 = svld1rq_f64 (ptrue, &d->inv_pio2); - svfloat64_t pio2_23 = svld1rq_f64 (ptrue, &d->pio2_2); + svfloat64_t invpio2_and_pio2_1 = svld1rq (ptrue, &d->inv_pio2); + svfloat64_t pio2_23 = svld1rq (ptrue, &d->pio2_2); /* n = rint(|x|/(pi/2)). */ - svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); - svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); + svfloat64_t q = svmla_lane (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); + svfloat64_t n = svsub_x (pg, q, d->shift); /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ - r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); - r = svmls_lane_f64 (r, n, pio2_23, 0); - r = svmls_lane_f64 (r, n, pio2_23, 1); + r = svmls_lane (r, n, invpio2_and_pio2_1, 1); + r = svmls_lane (r, n, pio2_23, 0); + r = svmls_lane (r, n, pio2_23, 1); /* cos(r) poly approx. */ - svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t r2 = svtsmul (r, svreinterpret_u64 (q)); svfloat64_t y = sv_f64 (0.0); - y = svtmad_f64 (y, r2, 7); - y = svtmad_f64 (y, r2, 6); - y = svtmad_f64 (y, r2, 5); - y = svtmad_f64 (y, r2, 4); - y = svtmad_f64 (y, r2, 3); - y = svtmad_f64 (y, r2, 2); - y = svtmad_f64 (y, r2, 1); - y = svtmad_f64 (y, r2, 0); + y = svtmad (y, r2, 7); + y = svtmad (y, r2, 6); + y = svtmad (y, r2, 5); + y = svtmad (y, r2, 4); + y = svtmad (y, r2, 3); + y = svtmad (y, r2, 2); + y = svtmad (y, r2, 1); + y = svtmad (y, r2, 0); /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ - svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); - /* Apply factor. */ - y = svmul_f64_x (pg, f, y); + svfloat64_t f = svtssel (r, svreinterpret_u64 (q)); + + if (__glibc_unlikely (svptest_any (pg, oob))) + return special_case (x, svmul_x (svnot_z (pg, oob), y, f), oob); - if (__glibc_unlikely (svptest_any (pg, out_of_bounds))) - return special_case (x, y, out_of_bounds); - return y; + /* Apply factor. */ + return svmul_x (pg, f, y); } diff --git a/sysdeps/aarch64/fpu/cosf_sve.c b/sysdeps/aarch64/fpu/cosf_sve.c index 577cbd864e..a0be56ec7e 100644 --- a/sysdeps/aarch64/fpu/cosf_sve.c +++ b/sysdeps/aarch64/fpu/cosf_sve.c @@ -37,9 +37,9 @@ static const struct data #define RangeVal 0x49800000 /* asuint32(0x1p20f). */ static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t out_of_bounds) +special_case (svfloat32_t x, svfloat32_t y, svbool_t oob) { - return sv_call_f32 (cosf, x, y, out_of_bounds); + return sv_call_f32 (cosf, x, y, oob); } /* A fast SVE implementation of cosf based on trigonometric @@ -51,40 +51,35 @@ svfloat32_t SV_NAME_F1 (cos) (svfloat32_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat32_t r = svabs_f32_x (pg, x); - svbool_t out_of_bounds - = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (r), RangeVal); + svfloat32_t r = svabs_x (pg, x); + svbool_t oob = svcmpge (pg, svreinterpret_u32 (r), RangeVal); /* Load some constants in quad-word chunks to minimise memory access. */ - svfloat32_t negpio2_and_invpio2 - = svld1rq_f32 (svptrue_b32 (), &d->neg_pio2_1); + svfloat32_t negpio2_and_invpio2 = svld1rq (svptrue_b32 (), &d->neg_pio2_1); /* n = rint(|x|/(pi/2)). */ - svfloat32_t q - = svmla_lane_f32 (sv_f32 (d->shift), r, negpio2_and_invpio2, 3); - svfloat32_t n = svsub_n_f32_x (pg, q, d->shift); + svfloat32_t q = svmla_lane (sv_f32 (d->shift), r, negpio2_and_invpio2, 3); + svfloat32_t n = svsub_x (pg, q, d->shift); /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ - r = svmla_lane_f32 (r, n, negpio2_and_invpio2, 0); - r = svmla_lane_f32 (r, n, negpio2_and_invpio2, 1); - r = svmla_lane_f32 (r, n, negpio2_and_invpio2, 2); + r = svmla_lane (r, n, negpio2_and_invpio2, 0); + r = svmla_lane (r, n, negpio2_and_invpio2, 1); + r = svmla_lane (r, n, negpio2_and_invpio2, 2); /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ - svfloat32_t f = svtssel_f32 (r, svreinterpret_u32_f32 (q)); + svfloat32_t f = svtssel (r, svreinterpret_u32 (q)); /* cos(r) poly approx. */ - svfloat32_t r2 = svtsmul_f32 (r, svreinterpret_u32_f32 (q)); + svfloat32_t r2 = svtsmul (r, svreinterpret_u32 (q)); svfloat32_t y = sv_f32 (0.0f); - y = svtmad_f32 (y, r2, 4); - y = svtmad_f32 (y, r2, 3); - y = svtmad_f32 (y, r2, 2); - y = svtmad_f32 (y, r2, 1); - y = svtmad_f32 (y, r2, 0); - + y = svtmad (y, r2, 4); + y = svtmad (y, r2, 3); + y = svtmad (y, r2, 2); + y = svtmad (y, r2, 1); + y = svtmad (y, r2, 0); + + if (__glibc_unlikely (svptest_any (pg, oob))) + return special_case (x, svmul_x (svnot_z (pg, oob), f, y), oob); /* Apply factor. */ - y = svmul_f32_x (pg, f, y); - - if (__glibc_unlikely (svptest_any (pg, out_of_bounds))) - return special_case (x, y, out_of_bounds); - return y; + return svmul_x (pg, f, y); }