From patchwork Wed Nov 17 15:58:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 47813 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A3373858400 for ; Wed, 17 Nov 2021 16:06:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4A3373858400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637165217; bh=gMoD0/YO819sziwYdDFpUfa0a5d/Oj5zDLkBd4DspHI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=CRF2BMNo0KE59CnyrXPYjgRXpHUkR6KdODbEDKb0r52TEU/iGV5bUtYn8kpHuW8UV JazQcCG+z199+/xppED8x8LG01c3qUuJ/j3kLtUsj4o1HRfflkta0Yep3rDJBcinSo JFct8J5Gbk8PGcJPdyVEW3YetdWpaqCvNpdpkIJk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40071.outbound.protection.outlook.com [40.107.4.71]) by sourceware.org (Postfix) with ESMTPS id 8E9D63858410 for ; Wed, 17 Nov 2021 15:59:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8E9D63858410 Received: from AS9PR06CA0311.eurprd06.prod.outlook.com (2603:10a6:20b:45b::8) by AM0PR08MB5139.eurprd08.prod.outlook.com (2603:10a6:208:15d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19; Wed, 17 Nov 2021 15:59:02 +0000 Received: from AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:45b:cafe::f3) by AS9PR06CA0311.outlook.office365.com (2603:10a6:20b:45b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19 via Frontend Transport; Wed, 17 Nov 2021 15:59:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT032.mail.protection.outlook.com (10.152.16.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4690.20 via Frontend Transport; Wed, 17 Nov 2021 15:59:02 +0000 Received: ("Tessian outbound dbb52aec1fa6:v110"); Wed, 17 Nov 2021 15:59:01 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 40010a44baa6eb32 X-CR-MTA-TID: 64aa7808 Received: from c30f05762114.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ECB9BDE2-1375-4244-A098-3B0FC2D90748.1; Wed, 17 Nov 2021 15:58:54 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c30f05762114.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 17 Nov 2021 15:58:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NLaFOjEOfop2tYl0MUjh8VlktbEvncB0NLZKlJ4UZOs0nwE0oTZVhwhTWbAr3eoNkPnkya9QqeCoB5DPKlxz8Tw7a1XcJERnsYqov94AcX2aUv8qosAxKmaXu/WJZ6rzKJ51mVmSccl4OKGUJxxoQWg4J5BGNwqcgcTnf17adm7TlamnX+yNUoN9sIBlBn7RG+IUdrCFtkz92PUSXL9W+pud1C176ZqQ8NximtUCQW697oj/vW2OfyQyv9FBhkMdtcXuNdoLP4MPeWLadl6zDiustDvt+ra7yaZwfTfk+BWMSOxw0LGBv8nQxH1blKyj6Jg3i+TP9FuHir1J1ESC8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gMoD0/YO819sziwYdDFpUfa0a5d/Oj5zDLkBd4DspHI=; b=cHqGsCdyaimXgS3BjDNwRxFxRQKacmr0aWoV6G/8q2hKSxOc/cliYj+nwh8oHx/eEzvMxk2DALX+zHvy54b5hfB9iTgqO5g/njcGvfPgMWZNg/MmQpq0T9q9nLKajcPOlCf/2f0i5QEhZ+dwcjHEaZIz7mdd3Hj5SWu+nE0vRvbi/RkxXWuHYKTS2b3AdRtYHkEKZfSJRzmqWHXKinzsYFHOVfeDPf1M24K5SprTIKLw+gqsxAhz2wEf21uU5SgOhBJwmXbxyPH/x4A6cPtMOzLxZ+fSGMIijTz7sKB732llbpStb97VjM6MHH26+TRBeobCiL/Ajdwhk1VKWbRd1g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB4638.eurprd08.prod.outlook.com (2603:10a6:802:b1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19; Wed, 17 Nov 2021 15:58:53 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4%8]) with mapi id 15.20.4713.019; Wed, 17 Nov 2021 15:58:53 +0000 To: Adhemerval Zanella Subject: RFC: Improve hypot performance Thread-Topic: RFC: Improve hypot performance Thread-Index: AQHX28SGNCsQvCJfBE6VimQCQbhoZg== Date: Wed, 17 Nov 2021 15:58:53 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 7c644ce8-8d12-e944-8c7c-fd04253f8ab1 Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: d98084a1-c67a-4ae7-1ea6-08d9a9e32c9b x-ms-traffictypediagnostic: VE1PR08MB4638:|AM0PR08MB5139: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4941;OLM:4941; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: UPWTSjw+BEUlyhu6nNa0wuAZ0MSk/TQco+Z57SUJ5D6fOWuJMDa2AX6780vvS06WKLgZUbYz3Jomnq3SatIlJQ5+hlyYThIluolP+nTjj0ihekduRbB/tBVcDdIks18BnFc3K1C4ib6Uf4BBdeVrNE6aXKedHeca/dojv/Q8w2Y8HYw2UQklZEm9a/A1SrcXLuXWA9MAzGNhDGeIfzIAGP9c0f1JSHKAfYeZ9HUpD8Jwo89Yx1vXbmb+bc8cd0PnA7WoiM085nKE8Ifqmos5ODVY9HSriKa+np5Z5qxj9udl1NT39ICGtsLALeT2Zh6PnjZRbloQAhVzjh16/U09aWTOyZ09nN4cj2CsvHuVpT2tjRIKOMT3lpGV1ePBPv+hupoR7ysoBaLHl51iZk8FxZSza5427Rq/HCF8Sw0Wm9QuTvhAGq7nWDVAi6V/4RDWNhAmBoKxE7SaA2LstVQHfqYxYLyRm8yuP87Ecw9sGiav+VUgLNwE4iUALnPECIKE/OD3ahBkXqD62OBmtLEmHuK4EXHfFPPuz40tu9QYhHTuxCS+Dk2rONEDBJSY8f2PgkH/dXxQF/i0m7lhJKhVGQY2N+PiBVYflI6vx7CTzpWNSRfUaadgRPrMvL8odlclsjcV2DDkiPKzAUNED0RWUY5cBBr9gRIxIG1e6DtMNB2lkEIyVZDTFF6/bZx9sy3rP0VRyeeAjJl+WhewF99wLw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(64756008)(186003)(86362001)(66446008)(4326008)(83380400001)(316002)(52536014)(66476007)(66946007)(76116006)(71200400001)(66556008)(91956017)(26005)(7116003)(9686003)(5660300002)(8676002)(33656002)(7696005)(508600001)(122000001)(38100700002)(55016002)(8936002)(6506007)(2906002)(38070700005)(6916009); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4638 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: bf518e82-ac78-45a5-bf2f-08d9a9e3274f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZTA2lTR85Hwj3VbImWgpSc2nBw66CVsWqZdOFWl2a35M2nl0Z6/fZLZuc7WIrL/W+AtDUw9TbBIKjiTQXYrjyHPA+nItEPYYjWVjMVzP7a4bDt8CbwYYYbyXlBEYnNUQjodt7KkucjjsNEevKU7F2OOBGzBtpqfUnqkGF1U2P7frLZqyMvn451G0Smx+ISEBaqquWpT+nW5dElhP7Z9mgrN5z7qayCvJHm6kMbhghmN0FM9XL1a41FjoWQfNDwjvC4dSCRplFclx92piDR5qZZzjTWW1f2cYjk9ZEXK+bNzyVUgYlkWjLWibg0cQgcBZRaRLTFWlxphKKXQxyKHvQuonjRkDgwvdU/gEJagMbxb92AP4qc6l4zYlxag3/ux7T9ALvm5GNnmK04RUWJ9Wa0zpWbvXElIyVAT2YN2hVBojUJFEc2QvFWv8t/Y66rNz68yixlGVTO82n4/fbe9Qch21w3pucENhBcjojx7F+9E1je8hvQ8xNFAGZXVvkKgPW5ck2HdW1CLI9EfO0mBfixvt3RQeFGfr4Hg9qEtUctrqbsI5rmKjwRdOdwNcB9fXMixTP0mamEjGiht0r8R24fNDzZxAaeUwrjAtzjFmCs+YJRyM7OVTAkago1wGiw/CTL0mBZSZRPPdVphbF26xn2PlyZNWbbA0vK8WJxQRb3JPxmMs7fjcmZXKThLS7MHy2OfWnnNJzTgC3DktLqYF9Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(36860700001)(55016002)(83380400001)(8936002)(7696005)(5660300002)(9686003)(47076005)(6862004)(4326008)(86362001)(6506007)(186003)(8676002)(26005)(7116003)(81166007)(356005)(70586007)(336012)(316002)(2906002)(33656002)(82310400003)(508600001)(52536014)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Nov 2021 15:59:02.0079 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d98084a1-c67a-4ae7-1ea6-08d9a9e32c9b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5139 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Hi Adhemerval, Here is an early version of a much faster hypot implementation. It uses fma to significantly outperform the powerpc version both in throughput and latency. It has a worst-case ULP of ~0.949 and passes the testsuite. The powerpc version has a worst-case ULP of ~1.21 and several test failures. It applies on top of your hypot patch series. I didn't optimize the non-fma case since modern targets have fma. It'll be interesting to compare it on Power. You'll need to correctly set FAST_FMINMAX to indicate support for inlined fmin/fmax instructions (this will be added to math_private.h for targets that have it), without it the code tries to use a conditional move since using a branch here is really bad for performance. Cheers, Wilco diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index d20bc3e3657e350a1103a8f8477db35ee60399e0..3906711788cff5d66725e5879bb4d6c36dd24dc7 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -37,73 +37,40 @@ #include #include +#define FAST_FMINMAX 1 +//#undef __FP_FAST_FMA + +#define SCALE 0x1p-600 +#define LARGE_VAL 0x1p+511 +#define TINY_VAL 0x1p-459 +#define EPS 0x1p-54 + + static inline double handle_errno (double r) { + r = math_narrow_eval (r); if (isinf (r)) __set_errno (ERANGE); return r; } -/* sqrt (DBL_EPSILON / 2.0) */ -#define SQRT_EPS_DIV_2 0x1.6a09e667f3bcdp-27 -/* DBL_MIN / (sqrt (DBL_EPSILON / 2.0)) */ -#define DBL_MIN_THRESHOLD 0x1.6a09e667f3bcdp-996 -/* eps (double) * sqrt (DBL_MIN)) */ -#define SCALE 0x1p-563 -/* 1 / eps (sqrt (DBL_MIN) */ -#define INV_SCALE 0x1p+563 -/* sqrt (DBL_MAX) */ -#define SQRT_DBL_MAX 0x1.6a09e667f3bccp+511 -/* sqrt (DBL_MIN) */ -#define SQRT_DBL_MIN 0x1p-511 - -double -__hypot (double x, double y) +static inline double +kernel (double ax, double ay) { - if ((isinf (x) || isinf (y)) - && !issignaling (x) && !issignaling (y)) - return INFINITY; - if (isnan (x) || isnan (y)) - return x + y; - - double ax = fabs (x); - double ay = fabs (y); - if (ay > ax) - { - double tmp = ax; - ax = ay; - ay = tmp; - } - - /* Widely varying operands. The DBL_MIN_THRESHOLD check is used to avoid - a spurious underflow from the multiplication. */ - if (ax >= DBL_MIN_THRESHOLD && ay <= ax * SQRT_EPS_DIV_2) - return (ay == 0.0) - ? ax - : handle_errno (math_narrow_eval (ax + DBL_TRUE_MIN)); + double t1, t2; +#ifdef __FP_FAST_FMA + t1 = ay + ay; + t2 = ax - ay; - double scale = SCALE; - if (ax > SQRT_DBL_MAX) - { - ax *= scale; - ay *= scale; - scale = INV_SCALE; - } - else if (ay < SQRT_DBL_MIN) - { - ax /= scale; - ay /= scale; - } + if (t1 >= ax) + return sqrt (fma (t1, ax, t2 * t2)); else - scale = 1.0; - + return sqrt (fma (ax, ax, ay * ay)); +#else double h = sqrt (ax * ax + ay * ay); - double t1, t2; - if (h == 0.0) - return h; - else if (h <= 2.0 * ay) + if (h <= 2.0 * ay) { double delta = h - ay; t1 = ax * (2.0 * delta - ax); @@ -112,14 +79,57 @@ __hypot (double x, double y) else { double delta = h - ax; - t1 = 2.0 * delta * (ax - 2 * ay); + t1 = 2.0 * delta * (ax - 2.0 * ay); t2 = (4.0 * delta - ay) * ay + delta * delta; } h -= (t1 + t2) / (2.0 * h); - h = math_narrow_eval (h * scale); - math_check_force_underflow_nonneg (h); - return handle_errno (h); + return h; +#endif +} + + +double +__hypot (double x, double y) +{ + if (!isfinite (x) || !isfinite (y)) + { + if ((isinf (x) || isinf (y)) + && !issignaling_inline (x) && !issignaling_inline (y)) + return INFINITY; + return x + y; + } + + x = fabs (x); + y = fabs (y); + + double ax = FAST_FMINMAX ? fmax (x, y) : (x < y ? y : x); + double ay = FAST_FMINMAX ? fmin (x, y) : (x < y ? x : y); + + if (__glibc_unlikely (ax > LARGE_VAL)) + { + if (__glibc_unlikely (ay <= ax * EPS)) + return handle_errno (ax + ay); + + return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE); + } + + if (__glibc_unlikely (ay < TINY_VAL)) + { + if (__glibc_unlikely (ax >= ay / EPS)) + return math_narrow_eval (ax + ay); + + ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE); + math_check_force_underflow_nonneg (ax); + return ax; + } + + /* Common case: ax is not huge and ay is not tiny. */ + if (__glibc_unlikely (ay <= ax * EPS)) + return math_narrow_eval (ax + ay); + + return math_narrow_eval (kernel (ax, ay)); } + strong_alias (__hypot, __ieee754_hypot) libm_alias_finite (__ieee754_hypot, __hypot) #if LIBM_SVID_COMPAT