From patchwork Fri Dec 3 00:00:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48438 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 528CF385AC28 for ; Fri, 3 Dec 2021 00:02:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 528CF385AC28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489736; bh=2Oa+mfKVG/TBiV0YyZ2DNoOBjgomjCMB1dbeZ7ILjhI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=sHktoFYixRPhllWsZYzZpxLT1cSqj5o7BDoUeNfxeaSF0ZSbf3fc21PNKUj18ef1n At2RU6Nd093HlWiLWr8oEjE5zS056d3+y+RHGvw2rut3htoQQWIqWwVUwkbSh7Hbyy ge6IsJV8RHhrkxthDY15fr3WqUcj1hn9OZrqbYmI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by sourceware.org (Postfix) with ESMTPS id BBFF53857C47 for ; Fri, 3 Dec 2021 00:01:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BBFF53857C47 Received: by mail-qt1-x829.google.com with SMTP id t34so1563999qtc.7 for ; Thu, 02 Dec 2021 16:01:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2Oa+mfKVG/TBiV0YyZ2DNoOBjgomjCMB1dbeZ7ILjhI=; b=UpeKeF48C9IgH2yO/MXgwwezzRg6+8dHilzVpKhF0YopepMmIsvtynXSdqdS+V1LKZ x1VUraQ3V3NsOymBpFTbD0p91zkkjQ7CqqfEt3mTnjp6pxIE9Y3qxb97JnUcgVIa9h3t lhRg7A3mO5QnwMygFtETICLB80vVgCN8+W2yC+zDPRwkK8mCB5sKO1Dzp2S7EvcO8dq5 KYzqeMN/XcvdW3bk28cYDA0k3D42rxqgxu/E0qggGn2osyYsp1VR8/nePto7d+TobVqr jXceivMuTT/GD9XelUdQaodlMqYT7vTcn+s6sXllNYl7livPU66dP7cY171GAtJvtgpv zBSQ== X-Gm-Message-State: AOAM531yMgQnH//E61lASeAFf+iY5WZ6ONWQ9hPG6kALF/yAhZW2f3cO aK/QveMFtFT95mTreG5TE2Jc5LW0RP42Lw== X-Google-Smtp-Source: ABdhPJzy79Awc9BcVIKUJeaXUzK2p/AEkV36qKMZ1QiKS5mex3OM1C+3qyyZ3vpHkKxGcZdJzZo4nw== X-Received: by 2002:ac8:5803:: with SMTP id g3mr17120990qtg.317.1638489668051; Thu, 02 Dec 2021 16:01:08 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:07 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 01/12] math: Simplify hypotf implementation Date: Thu, 2 Dec 2021 21:00:52 -0300 Message-Id: <20211203000103.737833-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by 7c10fd3515f to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu. --- sysdeps/ieee754/flt-32/e_hypotf.c | 64 +++++++++++++--------------- sysdeps/ieee754/flt-32/math_config.h | 9 ++++ 2 files changed, 38 insertions(+), 35 deletions(-) diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c index e770947dc1..1d082fe36c 100644 --- a/sysdeps/ieee754/flt-32/e_hypotf.c +++ b/sysdeps/ieee754/flt-32/e_hypotf.c @@ -1,46 +1,40 @@ -/* e_hypotf.c -- float version of e_hypot.c. - */ +/* Euclidean distance function. Float/Binary32 version. + Copyright (C) 2012-2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. -/* - * ==================================================== - * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. - * - * Developed at SunPro, a Sun Microsystems, Inc. business. - * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice - * is preserved. - * ==================================================== - */ + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include #include +#include "math_config.h" +#include #include -#include float -__ieee754_hypotf(float x, float y) +__ieee754_hypotf (float x, float y) { - double d_x, d_y; - int32_t ha, hb; - - GET_FLOAT_WORD(ha,x); - ha &= 0x7fffffff; - GET_FLOAT_WORD(hb,y); - hb &= 0x7fffffff; - if (ha == 0x7f800000 && !issignaling (y)) - return fabsf(x); - else if (hb == 0x7f800000 && !issignaling (x)) - return fabsf(y); - else if (ha > 0x7f800000 || hb > 0x7f800000) - return fabsf(x) * fabsf(y); - else if (ha == 0) - return fabsf(y); - else if (hb == 0) - return fabsf(x); - - d_x = (double) x; - d_y = (double) y; + if (!isfinite(x) || !isfinite(y)) + { + if ((isinf (x) || isinf (y)) + && !issignalingf_inline (x) && !issignalingf_inline (y)) + return INFINITY; + return x + y; + } - return (float) sqrt(d_x * d_x + d_y * d_y); + return math_narrow_eval (sqrt ((double) x * (double) x + + (double) y * (double) y)); } #ifndef __ieee754_hypotf libm_alias_finite (__ieee754_hypotf, __hypotf) diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h index 513454a297..daa8a82f99 100644 --- a/sysdeps/ieee754/flt-32/math_config.h +++ b/sysdeps/ieee754/flt-32/math_config.h @@ -101,6 +101,15 @@ asdouble (uint64_t i) return u.f; } +static inline int +issignalingf_inline (float x) +{ + uint32_t ix = asuint (x); + if (HIGH_ORDER_BIT_IS_SET_FOR_SNAN) + return (ix & 0x7fc00000) == 0x7fc00000; + return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL; +} + #define NOINLINE __attribute__ ((noinline)) attribute_hidden float __math_oflowf (uint32_t); From patchwork Fri Dec 3 00:00:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48439 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29184385B83E for ; Fri, 3 Dec 2021 00:03:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29184385B83E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489780; bh=QgMEINr6Kn3JotKDe5X37VeH+cMB5GIib1Vdejy7268=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=SGHDm+NpMnxSBRFOwM8RL7kyLuMIg3wCACtWQ9E/XikmFjiNf+muLLSrykTe/Z2IS LcXnTj7X74mfpee98O4843w9E56nMwxXUO1JuOr6fAqpZGW13I9t8yNTbyWH+KYQn/ Pp0t7TWHzOXIfvfMiiJxZ8mnF7H1+X/r9Vqa+faE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id F32FF3858428 for ; Fri, 3 Dec 2021 00:01:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F32FF3858428 Received: by mail-qt1-x833.google.com with SMTP id o17so1604687qtk.1 for ; Thu, 02 Dec 2021 16:01:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QgMEINr6Kn3JotKDe5X37VeH+cMB5GIib1Vdejy7268=; b=BKpJlfDI1lJFmT9/fnjD1gP9PuQhtwkp9vkXogjhW/We93XABwBkEsuTulE6LaDQld YG6h6LQ2BvCzq51hiFaTfN0AZlNHUP9ICVXt5cjkXFRaEd7OBMRKB+HbrXOksEKfiuyO EZW/k0xU6t1yN6eomlmonMD1mMrtvrrJ+ApiHdC7EGCAXdve6/FTp+PrDQ12zQGtk2F6 X5tIZ68haQUKgo3hpmOtR4TDl0vrSnsl3ujSJZym/tenHQ8cO2y4Blotz69RGOJMb/JI soAD0NhuQ6cOS2v5bEFbrOTU8enqHM10Me8iJbdQe9fcHfJANmRpiGPpinq5oPsUFyNA +fQQ== X-Gm-Message-State: AOAM531fZGEY2Eujxs0nYhBVXl94uCsMDCDQPrZpqfIchJ0Pfx1EKtei ifLzfdlYQBIq7icXWsNQAoyAKvLgWYeilA== X-Google-Smtp-Source: ABdhPJxG3qQI6NTXsGFnenMGCs1syMleQewGXLFeMTBUO3jw5qoFfqBooK86+JvSF5cmgfRd/w12xA== X-Received: by 2002:a05:622a:8d:: with SMTP id o13mr17158170qtw.574.1638489672251; Thu, 02 Dec 2021 16:01:12 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:12 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 02/12] math: Use an improved algorithm for hypot (dbl-64) Date: Thu, 2 Dec 2021 21:00:53 -0300 Message-Id: <20211203000103.737833-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From: Wilco Dijkstra This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf --- sysdeps/ieee754/dbl-64/e_hypot.c | 235 ++++++++++++------------------- 1 file changed, 92 insertions(+), 143 deletions(-) diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index 9ec4c1ced0..274b14b57e 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -1,164 +1,113 @@ -/* @(#)e_hypot.c 5.1 93/09/24 */ -/* - * ==================================================== - * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. - * - * Developed at SunPro, a Sun Microsystems, Inc. business. - * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice - * is preserved. - * ==================================================== - */ - -/* __ieee754_hypot(x,y) - * - * Method : - * If (assume round-to-nearest) z=x*x+y*y - * has error less than sqrt(2)/2 ulp, than - * sqrt(z) has error less than 1 ulp (exercise). - * - * So, compute sqrt(x*x+y*y) with some care as - * follows to get the error below 1 ulp: - * - * Assume x>y>0; - * (if possible, set rounding to round-to-nearest) - * 1. if x > 2y use - * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y - * where x1 = x with lower 32 bits cleared, x2 = x-x1; else - * 2. if x <= 2y use - * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y)) - * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1, - * y1= y with lower 32 bits chopped, y2 = y-y1. - * - * NOTE: scaling may be necessary if some argument is too - * large or too tiny - * - * Special cases: - * hypot(x,y) is INF if x or y is +INF or -INF; else - * hypot(x,y) is NAN if x or y is NAN. - * - * Accuracy: - * hypot(x,y) returns sqrt(x^2+y^2) with error less - * than 1 ulps (units in the last place) - */ +/* Euclidean distance function. Double/Binary64 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* The implementation uses a correction based on 'An Improved Algorithm for + hypot(a,b)' by Carlos F. Borges [1] usingthe MyHypot3 with the following + changes: + + - Handle qNaN and sNaN. + - Tune the 'widely varying operands' to avoid spurious underflow + due the multiplication and fix the return value for upwards + rounding mode. + - Handle required underflow exception for subnormal results. + + The expected ULP is ~0.792. + + [1] https://arxiv.org/pdf/1904.09481.pdf */ #include #include #include +#include #include +#include "math_config.h" -double -__ieee754_hypot (double x, double y) +#define SCALE 0x1p-600 +#define LARGE_VAL 0x1p+511 +#define TINY_VAL 0x1p-459 +#define EPS 0x1p-54 + +/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0 + and squaring ax, ay and (ax - ay) does not overflow or underflow. */ +static inline double +kernel (double ax, double ay) { - double a, b, t1, t2, y1, y2, w; - int32_t j, k, ha, hb; - - GET_HIGH_WORD (ha, x); - ha &= 0x7fffffff; - GET_HIGH_WORD (hb, y); - hb &= 0x7fffffff; - if (hb > ha) + double t1, t2; + double h = sqrt (ax * ax + ay * ay); + if (h <= 2.0 * ay) { - a = y; b = x; j = ha; ha = hb; hb = j; + double delta = h - ay; + t1 = ax * (2.0 * delta - ax); + t2 = (delta - 2.0 * (ax - ay)) * delta; } else { - a = x; b = y; - } - SET_HIGH_WORD (a, ha); /* a <- |a| */ - SET_HIGH_WORD (b, hb); /* b <- |b| */ - if ((ha - hb) > 0x3c00000) - { - return a + b; - } /* x/y > 2**60 */ - k = 0; - if (__glibc_unlikely (ha > 0x5f300000)) /* a>2**500 */ - { - if (ha >= 0x7ff00000) /* Inf or NaN */ - { - uint32_t low; - w = a + b; /* for sNaN */ - if (issignaling (a) || issignaling (b)) - return w; - GET_LOW_WORD (low, a); - if (((ha & 0xfffff) | low) == 0) - w = a; - GET_LOW_WORD (low, b); - if (((hb ^ 0x7ff00000) | low) == 0) - w = b; - return w; - } - /* scale a and b by 2**-600 */ - ha -= 0x25800000; hb -= 0x25800000; k += 600; - SET_HIGH_WORD (a, ha); - SET_HIGH_WORD (b, hb); - } - if (__builtin_expect (hb < 0x23d00000, 0)) /* b < 2**-450 */ - { - if (hb <= 0x000fffff) /* subnormal b or 0 */ - { - uint32_t low; - GET_LOW_WORD (low, b); - if ((hb | low) == 0) - return a; - t1 = 0; - SET_HIGH_WORD (t1, 0x7fd00000); /* t1=2^1022 */ - b *= t1; - a *= t1; - k -= 1022; - GET_HIGH_WORD (ha, a); - GET_HIGH_WORD (hb, b); - if (hb > ha) - { - t1 = a; - a = b; - b = t1; - j = ha; - ha = hb; - hb = j; - } - } - else /* scale a and b by 2^600 */ - { - ha += 0x25800000; /* a *= 2^600 */ - hb += 0x25800000; /* b *= 2^600 */ - k -= 600; - SET_HIGH_WORD (a, ha); - SET_HIGH_WORD (b, hb); - } + double delta = h - ax; + t1 = 2.0 * delta * (ax - 2.0 * ay); + t2 = (4.0 * delta - ay) * ay + delta * delta; } - /* medium size a and b */ - w = a - b; - if (w > b) + + h -= (t1 + t2) / (2.0 * h); + return h; +} + +double +__ieee754_hypot (double x, double y) +{ + if (!isfinite(x) || !isfinite(y)) { - t1 = 0; - SET_HIGH_WORD (t1, ha); - t2 = a - t1; - w = sqrt (t1 * t1 - (b * (-b) - t2 * (a + t1))); + if ((isinf (x) || isinf (y)) + && !issignaling_inline (x) && !issignaling_inline (y)) + return INFINITY; + return x + y; } - else + + x = fabs (x); + y = fabs (y); + + double ax = x < y ? y : x; + double ay = x < y ? x : y; + + /* If ax is huge, scale both inputs down. */ + if (__glibc_unlikely (ax > LARGE_VAL)) { - a = a + a; - y1 = 0; - SET_HIGH_WORD (y1, hb); - y2 = b - y1; - t1 = 0; - SET_HIGH_WORD (t1, ha + 0x00100000); - t2 = a - t1; - w = sqrt (t1 * y1 - (w * (-w) - (t1 * y2 + t2 * b))); + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax * SCALE, ay * SCALE) / SCALE; } - if (k != 0) + + /* If ay is tiny, scale both inputs up. */ + if (__glibc_unlikely (ay < TINY_VAL)) { - uint32_t high; - t1 = 1.0; - GET_HIGH_WORD (high, t1); - SET_HIGH_WORD (t1, high + (k << 20)); - w *= t1; - math_check_force_underflow_nonneg (w); - return w; + if (__glibc_unlikely (ax >= ay / EPS)) + return math_narrow_eval (ax + ay); + + ax = math_narrow_eval (kernel (ax / SCALE, ay / SCALE) * SCALE); + math_check_force_underflow_nonneg (ax); + return ax; } - else - return w; + + /* Common case: ax is not huge and ay is not tiny. */ + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax, ay); } #ifndef __ieee754_hypot libm_alias_finite (__ieee754_hypot, __hypot) From patchwork Fri Dec 3 00:00:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48440 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 26E5B3857C59 for ; Fri, 3 Dec 2021 00:03:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 26E5B3857C59 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489823; bh=JeRGAiF/AZkTIssuXcQM8q6M0tLNbw1qPOwQrq6SBOY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=SmvvNRDpRS4Uq9HrwvDvS4oAcZkzGFBdI6wcm9UE/t2dWkPvhKFYCB1YcbjYNPFzK lFYoeQt5ge3HyF2Ub4DxJyoGjgLcPwR1ynm6a1rCJXxyZsDmCZcxi4WjP9ql8V1Mnt j+dp/tHC/wuXsvQ8gbMYn0RWfJf/h2iSLfQaebVE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by sourceware.org (Postfix) with ESMTPS id 51E53385AC3E for ; Fri, 3 Dec 2021 00:01:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 51E53385AC3E Received: by mail-qv1-xf34.google.com with SMTP id i12so1123026qvh.11 for ; Thu, 02 Dec 2021 16:01:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JeRGAiF/AZkTIssuXcQM8q6M0tLNbw1qPOwQrq6SBOY=; b=K90urPp2JA9n35dwaFLhH8LB1jf6T2jDvdob53Gsmk7qC9mQZ6dIKwszGHGFpKXY0c ft+LMpk13Uq5wzXjEBqA4JeI/J99nzUMzbVGGLPHb9Kraw4j22PZFa8sIYaGFBspxQ6Z p0qVBteO7Qq0umUxJmoHO/OxfvIEyBNsM9cuAhUcUJ4LEAbl0hDitxhukDox5pN6dpOb OAM367PTFf1BXkas0xiuGAbIzqGp+Tr8J5rO74Te4VCCJVJSQVu8BoSFeUHBWQnT9HVZ +w3resWiGiz6Zj0ZOQI6HezsPBXCgc7MAVyXcSxwhTuautlyTJK0pO+CKEIebNplNILR Vy3g== X-Gm-Message-State: AOAM530xPQMI0B3VRBzE6waJsjXOnPvjNtBUoXIppd3laxqgkZCipPFg SpYcXP0UTFDLiYZmoR776vxVHh4oFWqipQ== X-Google-Smtp-Source: ABdhPJy2R0SIzqRYeHI49BQ+SuNZ8AingcVmJgPWGZMzXUJKw9lRVAlvEOcHpt0TGYcynjo64I0ieQ== X-Received: by 2002:a05:6214:238e:: with SMTP id fw14mr16512623qvb.86.1638489673717; Thu, 02 Dec 2021 16:01:13 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:13 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 03/12] math: Improve hypot performance with FMA Date: Thu, 2 Dec 2021 21:00:54 -0300 Message-Id: <20211203000103.737833-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From: Wilco Dijkstra Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite. --- sysdeps/ieee754/dbl-64/e_hypot.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index 274b14b57e..f53061badc 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -26,7 +26,11 @@ rounding mode. - Handle required underflow exception for subnormal results. - The expected ULP is ~0.792. + The expected ULP is ~0.792 or ~0.948 if FMA is used. For FMA, the + correction is not used and the error of sqrt (x^2 + y^2) is below 1 ULP + if x^2 + y^2 is computed with less than 0.707 ULP error. If |x| >= |2y|, + fma (x, x, y^2) has ~0.625 ULP. If |x| < |2y|, fma (|2x|, |y|, (x - y)^2) + has ~0.625 ULP. [1] https://arxiv.org/pdf/1904.09481.pdf */ @@ -48,6 +52,16 @@ static inline double kernel (double ax, double ay) { double t1, t2; +#ifdef __FP_FAST_FMA + t1 = ay + ay; + t2 = ax - ay; + + if (t1 >= ax) + return sqrt (fma (t1, ax, t2 * t2)); + else + return sqrt (fma (ax, ax, ay * ay)); + +#else double h = sqrt (ax * ax + ay * ay); if (h <= 2.0 * ay) { @@ -64,6 +78,7 @@ kernel (double ax, double ay) h -= (t1 + t2) / (2.0 * h); return h; +#endif } double From patchwork Fri Dec 3 00:00:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48441 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90C04385BF9C for ; Fri, 3 Dec 2021 00:04:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90C04385BF9C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489870; bh=Gx0aIDAJg6OQm6Fjt7nFMaZGsJjR0dJG6qwrFuu8NJQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=r1s5XfhH5fdoDHRV8lrIBvk7rehMawlczjBt9gxnkMHuiHS2tBJ86Eng7aNaPmx0D /zbqwNpPin/VTN7DJdhydDYAhmKNy425mgpuLJBmRV8wyXitnU0iUsQgT0mtXfsBTf GvPAWE4BXrfi6NJi/1QnXUjIbnttcQCTdmgA74bA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by sourceware.org (Postfix) with ESMTPS id CC347385AC3E for ; Fri, 3 Dec 2021 00:01:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC347385AC3E Received: by mail-qv1-xf2d.google.com with SMTP id i13so1176664qvm.1 for ; Thu, 02 Dec 2021 16:01:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Gx0aIDAJg6OQm6Fjt7nFMaZGsJjR0dJG6qwrFuu8NJQ=; b=u+misFeMliDQemsX/vi96p6inwifFRH2lYiMs3QM9KCuxYDSCBcv3Ht9m6rG7qo14S L3TewpKQfuBue1ZfLuZjm7w+ty1HUBk+0D666SoIjlrtgCebrwhm7OSdOPEZ5RrzubVp +ImXkFpDeM4DlWw0AN2YU+NuogWCsmvTPNaeYN4mZKkjbPzfXJIJav3/Lt22tKVdFl2m a0iQRt/oAiU4rpbPbw3ZnR/UjcwHcHziT88aC4+MNZvQp40KYBR5CZisV885I8vKR6sE ofwvu1TwGKBJPnfUDp40tGd4VzuBhFMx8K5ylz6ATwcbuRCIsxVa815SgRV/ZvWK3jIe i6XQ== X-Gm-Message-State: AOAM533DXm5j+mT1wAZatY16ihHLv5wEThpcvIPTxwgJwLF/7E13/1lG E1sRDK8KoepBVSDRgISnV4WUQhtqm/5pNw== X-Google-Smtp-Source: ABdhPJzt0CuT6Z6JGzZjbNmgNIQ8xfmrIr18BYKvOmqJJraabmX94Nd6FAJ7cqN1YBbqPnPQkitHww== X-Received: by 2002:a05:6214:4019:: with SMTP id kd25mr16380359qvb.27.1638489675127; Thu, 02 Dec 2021 16:01:15 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:14 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 04/12] math: Use an improved algorithm for hypotl (ldbl-96) Date: Thu, 2 Dec 2021 21:00:55 -0300 Message-Id: <20211203000103.737833-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf --- sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 ++++++++++++----------------- 1 file changed, 98 insertions(+), 133 deletions(-) diff --git a/sysdeps/ieee754/ldbl-96/e_hypotl.c b/sysdeps/ieee754/ldbl-96/e_hypotl.c index 44e72353c0..b32fad757f 100644 --- a/sysdeps/ieee754/ldbl-96/e_hypotl.c +++ b/sysdeps/ieee754/ldbl-96/e_hypotl.c @@ -1,142 +1,107 @@ -/* e_hypotl.c -- long double version of e_hypot.c. - */ - -/* - * ==================================================== - * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. - * - * Developed at SunPro, a Sun Microsystems, Inc. business. - * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice - * is preserved. - * ==================================================== - */ - -/* __ieee754_hypotl(x,y) - * - * Method : - * If (assume round-to-nearest) z=x*x+y*y - * has error less than sqrt(2)/2 ulp, than - * sqrt(z) has error less than 1 ulp (exercise). - * - * So, compute sqrt(x*x+y*y) with some care as - * follows to get the error below 1 ulp: - * - * Assume x>y>0; - * (if possible, set rounding to round-to-nearest) - * 1. if x > 2y use - * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y - * where x1 = x with lower 32 bits cleared, x2 = x-x1; else - * 2. if x <= 2y use - * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y)) - * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1, - * y1= y with lower 32 bits chopped, y2 = y-y1. - * - * NOTE: scaling may be necessary if some argument is too - * large or too tiny - * - * Special cases: - * hypot(x,y) is INF if x or y is +INF or -INF; else - * hypot(x,y) is NAN if x or y is NAN. - * - * Accuracy: - * hypot(x,y) returns sqrt(x^2+y^2) with error less - * than 1 ulps (units in the last place) - */ +/* Euclidean distance function. Long Double/Binary96 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by + Carlos F. Borges [1] using the MyHypot3 with the following changes: + + - Handle qNaN and sNaN. + - Tune the 'widely varying operands' to avoid spurious underflow + due the multiplication and fix the return value for upwards + rounding mode. + - Handle required underflow exception for subnormal results. + + [1] https://arxiv.org/pdf/1904.09481.pdf */ #include #include #include #include -long double __ieee754_hypotl(long double x, long double y) +#define SCALE 0x8p-8257L +#define LARGE_VAL 0xb.504f333f9de6484p+8188L +#define TINY_VAL 0x8p-8194L +#define EPS 0x8p-68L + +/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0 + and squaring ax, ay and (ax - ay) does not overflow or underflow. */ +static inline long double +kernel (long double ax, long double ay) { - long double a,b,t1,t2,y1,y2,w; - uint32_t j,k,ea,eb; - - GET_LDOUBLE_EXP(ea,x); - ea &= 0x7fff; - GET_LDOUBLE_EXP(eb,y); - eb &= 0x7fff; - if(eb > ea) {a=y;b=x;j=ea; ea=eb;eb=j;} else {a=x;b=y;} - SET_LDOUBLE_EXP(a,ea); /* a <- |a| */ - SET_LDOUBLE_EXP(b,eb); /* b <- |b| */ - if((ea-eb)>0x46) {return a+b;} /* x/y > 2**70 */ - k=0; - if(__builtin_expect(ea > 0x5f3f,0)) { /* a>2**8000 */ - if(ea == 0x7fff) { /* Inf or NaN */ - uint32_t exp __attribute__ ((unused)); - uint32_t high,low; - w = a+b; /* for sNaN */ - if (issignaling (a) || issignaling (b)) - return w; - GET_LDOUBLE_WORDS(exp,high,low,a); - if(((high&0x7fffffff)|low)==0) w = a; - GET_LDOUBLE_WORDS(exp,high,low,b); - if(((eb^0x7fff)|(high&0x7fffffff)|low)==0) w = b; - return w; - } - /* scale a and b by 2**-9600 */ - ea -= 0x2580; eb -= 0x2580; k += 9600; - SET_LDOUBLE_EXP(a,ea); - SET_LDOUBLE_EXP(b,eb); - } - if(__builtin_expect(eb < 0x20bf, 0)) { /* b < 2**-8000 */ - if(eb == 0) { /* subnormal b or 0 */ - uint32_t exp __attribute__ ((unused)); - uint32_t high,low; - GET_LDOUBLE_WORDS(exp,high,low,b); - if((high|low)==0) return a; - SET_LDOUBLE_WORDS(t1, 0x7ffd, 0x80000000, 0); /* t1=2^16382 */ - b *= t1; - a *= t1; - k -= 16382; - GET_LDOUBLE_EXP (ea, a); - GET_LDOUBLE_EXP (eb, b); - if (eb > ea) - { - t1 = a; - a = b; - b = t1; - j = ea; - ea = eb; - eb = j; - } - } else { /* scale a and b by 2^9600 */ - ea += 0x2580; /* a *= 2^9600 */ - eb += 0x2580; /* b *= 2^9600 */ - k -= 9600; - SET_LDOUBLE_EXP(a,ea); - SET_LDOUBLE_EXP(b,eb); - } - } - /* medium size a and b */ - w = a-b; - if (w>b) { - uint32_t high; - GET_LDOUBLE_MSW(high,a); - SET_LDOUBLE_WORDS(t1,ea,high,0); - t2 = a-t1; - w = sqrtl(t1*t1-(b*(-b)-t2*(a+t1))); - } else { - uint32_t high; - GET_LDOUBLE_MSW(high,b); - a = a+a; - SET_LDOUBLE_WORDS(y1,eb,high,0); - y2 = b - y1; - GET_LDOUBLE_MSW(high,a); - SET_LDOUBLE_WORDS(t1,ea+1,high,0); - t2 = a - t1; - w = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b))); - } - if(k!=0) { - uint32_t exp; - t1 = 1.0; - GET_LDOUBLE_EXP(exp,t1); - SET_LDOUBLE_EXP(t1,exp+k); - w *= t1; - math_check_force_underflow_nonneg (w); - return w; - } else return w; + long double t1, t2; + long double h = sqrtl (ax * ax + ay * ay); + if (h <= 2.0L * ay) + { + long double delta = h - ay; + t1 = ax * (2.0L * delta - ax); + t2 = (delta - 2.0L * (ax - ay)) * delta; + } + else + { + long double delta = h - ax; + t1 = 2.0L * delta * (ax - 2.0L * ay); + t2 = (4.0L * delta - ay) * ay + delta * delta; + } + + h -= (t1 + t2) / (2.0L * h); + return h; +} + +long double +__ieee754_hypotl (long double x, long double y) +{ + if (!isfinite(x) || !isfinite(y)) + { + if ((isinf (x) || isinf (y)) + && !issignaling (x) && !issignaling (y)) + return INFINITY; + return x + y; + } + + x = fabsl (x); + y = fabsl (y); + + long double ax = x < y ? y : x; + long double ay = x < y ? x : y; + + /* If ax is huge, scale both inputs down. */ + if (__glibc_unlikely (ax > LARGE_VAL)) + { + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax * SCALE, ay * SCALE) / SCALE; + } + + /* If ay is tiny, scale both inputs up. */ + if (__glibc_unlikely (ay < TINY_VAL)) + { + if (__glibc_unlikely (ax >= ay / EPS)) + return ax; + + ax = kernel (ax / SCALE, ay / SCALE) * SCALE; + math_check_force_underflow_nonneg (ax); + return ax; + } + + /* Common case: ax is not huge and ay is not tiny. */ + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax, ay); } libm_alias_finite (__ieee754_hypotl, __hypotl) From patchwork Fri Dec 3 00:00:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48442 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 311FA385BF83 for ; Fri, 3 Dec 2021 00:05:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 311FA385BF83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489913; bh=Oa46JLfYYccSvBSuyWotO8SkeknWU1tKuYnQzL+tTd8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=yZthZwqQzdCIJVAuy2X0YRrWQ6HcekqZCFy033uQFjA2l9RWbvAjm6MMKLuE1w/3x VwvkvJBPOdUu+RJ+H7bPrp+V1L9UDU0aeKARStDSFcyJo7U/wS+pBQ22Spxe4z787j 14IYJB+51x9367S1so4wjgbzVw/OetOEuZqV5nTw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by sourceware.org (Postfix) with ESMTPS id 531EF385BF9C for ; Fri, 3 Dec 2021 00:01:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 531EF385BF9C Received: by mail-qk1-x72b.google.com with SMTP id b67so1723893qkg.6 for ; Thu, 02 Dec 2021 16:01:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Oa46JLfYYccSvBSuyWotO8SkeknWU1tKuYnQzL+tTd8=; b=AevPRo0AqbAhk+nnnwruzvXr9NxmILPnfAvvP0s82fNfQ77WS9u/Lj04NcARpTDhcn xK2GsM0WPxx29FTeo8xZd68OJqB48a+f+2YsIOMznLlQyP1DKH6SDLFvaAEezfO9iOZo Iv9fZoDwwZLA2Reu5r3v8oBsrAN/bKYLqYXIoSK5i59f9xnf7mARoB9PJINSLRCENzNW rl8RGQUbX9mwgi5sd5E/xenw9cXIB2X9G7bvbxXMjsAhidSDe36O/mju75BAls5PxCkh ueiReZTTo7n2SfxZcMj3JFasLIVdewpV+MbSL2NmYyPBBAeKokpRTBVI6fodxU2JAJ6M mMHw== X-Gm-Message-State: AOAM532vvuX9gWHbhQXRov9VkKt9O6CdM7/PTo2AYv0p2k4h2TmTtAJt c07lkVZtcrHVVqRlOgVnfrDdyYKS/h5i7A== X-Google-Smtp-Source: ABdhPJxQnAok3sOimXT/uAMAtHQA+ypl+tBBGE3DDHYoimGzpDNweBbrZY+Vva1DMjbWt5gffDcU+A== X-Received: by 2002:a05:620a:1029:: with SMTP id a9mr15049366qkk.186.1638489676646; Thu, 02 Dec 2021 16:01:16 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:16 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 05/12] math: Use an improved algorithm for hypotl (ldbl-128) Date: Thu, 2 Dec 2021 21:00:56 -0300 Message-Id: <20211203000103.737833-6-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf --- sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 ++++++++++++---------------- 1 file changed, 96 insertions(+), 130 deletions(-) diff --git a/sysdeps/ieee754/ldbl-128/e_hypotl.c b/sysdeps/ieee754/ldbl-128/e_hypotl.c index cd4fdbc4a6..022fa9aaf7 100644 --- a/sysdeps/ieee754/ldbl-128/e_hypotl.c +++ b/sysdeps/ieee754/ldbl-128/e_hypotl.c @@ -1,141 +1,107 @@ -/* e_hypotl.c -- long double version of e_hypot.c. - */ - -/* - * ==================================================== - * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. - * - * Developed at SunPro, a Sun Microsystems, Inc. business. - * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice - * is preserved. - * ==================================================== - */ - -/* __ieee754_hypotl(x,y) - * - * Method : - * If (assume round-to-nearest) z=x*x+y*y - * has error less than sqrtl(2)/2 ulp, than - * sqrtl(z) has error less than 1 ulp (exercise). - * - * So, compute sqrtl(x*x+y*y) with some care as - * follows to get the error below 1 ulp: - * - * Assume x>y>0; - * (if possible, set rounding to round-to-nearest) - * 1. if x > 2y use - * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y - * where x1 = x with lower 64 bits cleared, x2 = x-x1; else - * 2. if x <= 2y use - * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y)) - * where t1 = 2x with lower 64 bits cleared, t2 = 2x-t1, - * y1= y with lower 64 bits chopped, y2 = y-y1. - * - * NOTE: scaling may be necessary if some argument is too - * large or too tiny - * - * Special cases: - * hypotl(x,y) is INF if x or y is +INF or -INF; else - * hypotl(x,y) is NAN if x or y is NAN. - * - * Accuracy: - * hypotl(x,y) returns sqrtl(x^2+y^2) with error less - * than 1 ulps (units in the last place) - */ +/* Euclidean distance function. Long Double/Binary128 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This implementation is based on 'An Improved Algorithm for hypot(a,b)' by + Carlos F. Borges [1] using the MyHypot3 with the following changes: + + - Handle qNaN and sNaN. + - Tune the 'widely varying operands' to avoid spurious underflow + due the multiplication and fix the return value for upwards + rounding mode. + - Handle required underflow exception for subnormal results. + + [1] https://arxiv.org/pdf/1904.09481.pdf */ #include #include #include #include +#define SCALE L(0x1p-8303) +#define LARGE_VAL L(0x1.6a09e667f3bcc908b2fb1366ea95p+8191) +#define TINY_VAL L(0x1p-8191) +#define EPS L(0x1p-114) + +/* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0 + and squaring ax, ay and (ax - ay) does not overflow or underflow. */ +static inline _Float128 +kernel (_Float128 ax, _Float128 ay) +{ + _Float128 t1, t2; + _Float128 h = sqrtl (ax * ax + ay * ay); + if (h <= L(2.0) * ay) + { + _Float128 delta = h - ay; + t1 = ax * (L(2.0) * delta - ax); + t2 = (delta - L(2.0) * (ax - ay)) * delta; + } + else + { + _Float128 delta = h - ax; + t1 = L(2.0) * delta * (ax - L(2.0) * ay); + t2 = (L(4.0) * delta - ay) * ay + delta * delta; + } + + h -= (t1 + t2) / (L(2.0) * h); + return h; +} + _Float128 __ieee754_hypotl(_Float128 x, _Float128 y) { - _Float128 a,b,t1,t2,y1,y2,w; - int64_t j,k,ha,hb; - - GET_LDOUBLE_MSW64(ha,x); - ha &= 0x7fffffffffffffffLL; - GET_LDOUBLE_MSW64(hb,y); - hb &= 0x7fffffffffffffffLL; - if(hb > ha) {a=y;b=x;j=ha; ha=hb;hb=j;} else {a=x;b=y;} - SET_LDOUBLE_MSW64(a,ha); /* a <- |a| */ - SET_LDOUBLE_MSW64(b,hb); /* b <- |b| */ - if((ha-hb)>0x78000000000000LL) {return a+b;} /* x/y > 2**120 */ - k=0; - if(ha > 0x5f3f000000000000LL) { /* a>2**8000 */ - if(ha >= 0x7fff000000000000LL) { /* Inf or NaN */ - uint64_t low; - w = a+b; /* for sNaN */ - if (issignaling (a) || issignaling (b)) - return w; - GET_LDOUBLE_LSW64(low,a); - if(((ha&0xffffffffffffLL)|low)==0) w = a; - GET_LDOUBLE_LSW64(low,b); - if(((hb^0x7fff000000000000LL)|low)==0) w = b; - return w; - } - /* scale a and b by 2**-9600 */ - ha -= 0x2580000000000000LL; - hb -= 0x2580000000000000LL; k += 9600; - SET_LDOUBLE_MSW64(a,ha); - SET_LDOUBLE_MSW64(b,hb); - } - if(hb < 0x20bf000000000000LL) { /* b < 2**-8000 */ - if(hb <= 0x0000ffffffffffffLL) { /* subnormal b or 0 */ - uint64_t low; - GET_LDOUBLE_LSW64(low,b); - if((hb|low)==0) return a; - t1=0; - SET_LDOUBLE_MSW64(t1,0x7ffd000000000000LL); /* t1=2^16382 */ - b *= t1; - a *= t1; - k -= 16382; - GET_LDOUBLE_MSW64 (ha, a); - GET_LDOUBLE_MSW64 (hb, b); - if (hb > ha) - { - t1 = a; - a = b; - b = t1; - j = ha; - ha = hb; - hb = j; - } - } else { /* scale a and b by 2^9600 */ - ha += 0x2580000000000000LL; /* a *= 2^9600 */ - hb += 0x2580000000000000LL; /* b *= 2^9600 */ - k -= 9600; - SET_LDOUBLE_MSW64(a,ha); - SET_LDOUBLE_MSW64(b,hb); - } - } - /* medium size a and b */ - w = a-b; - if (w>b) { - t1 = 0; - SET_LDOUBLE_MSW64(t1,ha); - t2 = a-t1; - w = sqrtl(t1*t1-(b*(-b)-t2*(a+t1))); - } else { - a = a+a; - y1 = 0; - SET_LDOUBLE_MSW64(y1,hb); - y2 = b - y1; - t1 = 0; - SET_LDOUBLE_MSW64(t1,ha+0x0001000000000000LL); - t2 = a - t1; - w = sqrtl(t1*y1-(w*(-w)-(t1*y2+t2*b))); - } - if(k!=0) { - uint64_t high; - t1 = 1; - GET_LDOUBLE_MSW64(high,t1); - SET_LDOUBLE_MSW64(t1,high+(k<<48)); - w *= t1; - math_check_force_underflow_nonneg (w); - return w; - } else return w; + if (!isfinite(x) || !isfinite(y)) + { + if ((isinf (x) || isinf (y)) + && !issignaling (x) && !issignaling (y)) + return INFINITY; + return x + y; + } + + x = fabsl (x); + y = fabsl (y); + + _Float128 ax = x < y ? y : x; + _Float128 ay = x < y ? x : y; + + /* If ax is huge, scale both inputs down. */ + if (__glibc_unlikely (ax > LARGE_VAL)) + { + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax * SCALE, ay * SCALE) / SCALE; + } + + /* If ay is tiny, scale both inputs up. */ + if (__glibc_unlikely (ay < TINY_VAL)) + { + if (__glibc_unlikely (ax >= ay / EPS)) + return ax; + + ax = kernel (ax / SCALE, ay / SCALE) * SCALE; + math_check_force_underflow_nonneg (ax); + return ax; + } + + /* Common case: ax is not huge and ay is not tiny. */ + if (__glibc_unlikely (ay <= ax * EPS)) + return ax + ay; + + return kernel (ax, ay); } libm_alias_finite (__ieee754_hypotl, __hypotl) From patchwork Fri Dec 3 00:00:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48444 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9D6D23857810 for ; Fri, 3 Dec 2021 00:06:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9D6D23857810 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489998; bh=Llz688BYxQabliQjjHxeIWbyGOFX0BXSw1SRfiFS8ek=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TKN3ZSVtODwiX9jRFaZd56kacME7Vb/ohfVmq8oCG3HqgL3/dxI/IlYMBuaYG8IVd t0iNQf1P4X/8HvetxQeuwz1OVLd+ZmO9nVQRyftZFJvldrp2IWIREAKJ/JURNKO/h4 7EuAj0w1ivYPQ4Q+8erDXdedWKl25Kcxn7QqcuO4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id ECBA13857C6D for ; Fri, 3 Dec 2021 00:01:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ECBA13857C6D Received: by mail-qt1-x833.google.com with SMTP id l8so1574154qtk.6 for ; Thu, 02 Dec 2021 16:01:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Llz688BYxQabliQjjHxeIWbyGOFX0BXSw1SRfiFS8ek=; b=i58oEvznM2vm4gysTckhv29GFGi5Oa6lwmXzypxe3m2bwsSl57BISfNgpmdh1F8jo8 G1G3FMmfQX1TQ4HcXeo3ym6Y2RKRe1qMvKq3rp7PLa+UAlOYFb3WcqdtCjPeN+w+izQr 1Zc9iBc2HyZHtV9l40AhZ+2M4FDj544qddNTZUVLIe2C757P6FxBVLuuVzMY+rS3LuMW Ca/M8J97qhMaBclNJYYp0FCUhdVcugBZ8TTNVDzJ9BiUZ11vc9hd+CIPP4fhoSzWs+4q 8XiFDeaXGtOj1UpZyUewBCBAadW7IaSFNV26O4MteqdKqYbO4tVzuF9Sk1ERnDZSwVqc W9Ng== X-Gm-Message-State: AOAM531IzrqE1n6XMVuDv/cCJtMJkGykFxPv4/ig/zrEQ7es0p9uPnUX qBU0UbKyW+/2M3MZtSMQfN7T7iy6YEua1A== X-Google-Smtp-Source: ABdhPJybNqhVXai6XBBPftZjOulnmYXpzFBNi/Hj2tJ45bXeMHmqzWsjYAqiv9RqXlIR/kC6uIQJFg== X-Received: by 2002:a05:622a:1a9b:: with SMTP id s27mr17270907qtc.417.1638489678220; Thu, 02 Dec 2021 16:01:18 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:17 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 06/12] math: Remove powerpc e_hypot Date: Thu, 2 Dec 2021 21:00:57 -0300 Message-Id: <20211203000103.737833-7-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TVD_SUBJ_WIPE_DEBT, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The generic implementation is shows only slight worse performance: POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9). --- sysdeps/powerpc/fpu/e_hypot.c | 87 ------------------- sysdeps/powerpc/fpu/e_hypotf.c | 78 ----------------- .../powerpc32/power4/fpu/multiarch/Makefile | 5 +- .../power4/fpu/multiarch/e_hypot-power7.c | 23 ----- .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 ----- .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 ------- .../power4/fpu/multiarch/e_hypotf-power7.c | 23 ----- .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 ----- .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 ------- 9 files changed, 1 insertion(+), 327 deletions(-) delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c deleted file mode 100644 index f96c589bbd..0000000000 --- a/sysdeps/powerpc/fpu/e_hypot.c +++ /dev/null @@ -1,87 +0,0 @@ -/* Pythagorean addition using doubles - Copyright (C) 2011-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Library General Public License as - published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Library General Public License for more details. - - You should have received a copy of the GNU Library General Public - License along with the GNU C Library; see the file COPYING.LIB. If - not, see . */ - -#include -#include -#include -#include -#include - -/* __ieee754_hypot(x,y) - * - * This a FP only version without any FP->INT conversion. - * It is similar to default C version, making appropriates - * overflow and underflows checks as well scaling when it - * is needed. - */ - -double -__ieee754_hypot (double x, double y) -{ - if ((isinf (x) || isinf (y)) - && !issignaling (x) && !issignaling (y)) - return INFINITY; - if (isnan (x) || isnan (y)) - return x + y; - - x = fabs (x); - y = fabs (y); - - if (y > x) - { - double t = x; - x = y; - y = t; - } - if (y == 0.0) - return x; - - /* if y is higher enough, y * 2^60 might overflow. The tests if - y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the - appropriate check to avoid the overflow exception generation. */ - if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60)) - return x + y; - - if (x > 0x1p+500) - { - x *= 0x1p-600; - y *= 0x1p-600; - return sqrt (x * x + y * y) / 0x1p-600; - } - if (y < 0x1p-500) - { - if (y <= 0x0.fffffffffffffp-1022) - { - x *= 0x1p+1022; - y *= 0x1p+1022; - double ret = sqrt (x * x + y * y) / 0x1p+1022; - math_check_force_underflow_nonneg (ret); - return ret; - } - else - { - x *= 0x1p+600; - y *= 0x1p+600; - return sqrt (x * x + y * y) / 0x1p+600; - } - } - return sqrt (x * x + y * y); -} -#ifndef __ieee754_hypot -libm_alias_finite (__ieee754_hypot, __hypot) -#endif diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c deleted file mode 100644 index fa201dda51..0000000000 --- a/sysdeps/powerpc/fpu/e_hypotf.c +++ /dev/null @@ -1,78 +0,0 @@ -/* Pythagorean addition using floats - Copyright (C) 2011-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Library General Public License as - published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Library General Public License for more details. - - You should have received a copy of the GNU Library General Public - License along with the GNU C Library; see the file COPYING.LIB. If - not, see . */ - -#include -#include -#include -#include - -/* __ieee754_hypotf(x,y) - - This a FP only version without any FP->INT conversion. - It is similar to default C version, making appropriates - overflow and underflows checks as using double precision - instead of scaling. */ - -#ifdef _ARCH_PWR7 -/* POWER7 isinf and isnan optimizations are fast. */ -# define TEST_INF_NAN(x, y) \ - if ((isinff(x) || isinff(y)) \ - && !issignaling (x) && !issignaling (y)) \ - return INFINITY; \ - if (isnanf(x) || isnanf(y)) \ - return x + y; -# else -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are - * costly (especially for POWER6). */ -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2) \ - do { \ - ieee_float_shape_type gf_u1; \ - ieee_float_shape_type gf_u2; \ - gf_u1.value = (f1); \ - gf_u2.value = (f2); \ - (i1) = gf_u1.word & 0x7fffffff; \ - (i2) = gf_u2.word & 0x7fffffff; \ - } while (0) - -# define TEST_INF_NAN(x, y) \ - do { \ - uint32_t hx, hy; \ - GET_TWO_FLOAT_WORD(x, y, hx, hy); \ - if (hy > hx) { \ - uint32_t ht = hx; hx = hy; hy = ht; \ - } \ - if (hx >= 0x7f800000) { \ - if ((hx == 0x7f800000 || hy == 0x7f800000) \ - && !issignaling (x) && !issignaling (y)) \ - return INFINITY; \ - return x + y; \ - } \ - } while (0) -#endif - - -float -__ieee754_hypotf (float x, float y) -{ - TEST_INF_NAN (x, y); - - return sqrt ((double) x * x + (double) y * y); -} -#ifndef __ieee754_hypotf -libm_alias_finite (__ieee754_hypotf, __hypotf) -#endif diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile index 60f2c95532..1de0f9b350 100644 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \ s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \ s_modff-power5+ s_modff-ppc32 s_logbl-power7 \ s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \ - s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \ - e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32 + s_logbf-power7 s_logbf-ppc32 CFLAGS-s_llrintf-power6.c += -mcpu=power6 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4 @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+ CFLAGS-s_logbl-power7.c = -mcpu=power7 CFLAGS-s_logb-power7.c = -mcpu=power7 CFLAGS-s_logbf-power7.c = -mcpu=power7 -CFLAGS-e_hypot-power7.c = -mcpu=power7 -CFLAGS-e_hypotf-power7.c = -mcpu=power7 # These files quiet sNaNs in a way that is optimized away without # -fsignaling-nans. diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c deleted file mode 100644 index 382b4a0b27..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() POWER7 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - -#define __ieee754_hypot __ieee754_hypot_power7 - -#include diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c deleted file mode 100644 index abb14d5469..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() PowerPC32 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - -#define __ieee754_hypot __ieee754_hypot_ppc32 - -#include diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c deleted file mode 100644 index a16efa350c..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c +++ /dev/null @@ -1,33 +0,0 @@ -/* Multiple versions of ieee754_hypot. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include -#include -#include -#include "init-arch.h" - -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden; -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden; - -libc_ifunc (__ieee754_hypot, - (hwcap & PPC_FEATURE_ARCH_2_06) - ? __ieee754_hypot_power7 - : __ieee754_hypot_ppc32); - -libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c deleted file mode 100644 index f8a26ff22f..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee754_hypot POWER7 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - -#define __ieee754_hypotf __ieee754_hypotf_power7 - -#include diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c deleted file mode 100644 index b13f8c9db2..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() PowerPC32 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - -#define __ieee754_hypotf __ieee754_hypotf_ppc32 - -#include diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c deleted file mode 100644 index 1e72605db8..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c +++ /dev/null @@ -1,33 +0,0 @@ -/* Multiple versions of ieee754_hypotf. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include -#include -#include -#include "init-arch.h" - -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden; -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden; - -libc_ifunc (__ieee754_hypotf, - (hwcap & PPC_FEATURE_ARCH_2_06) - ? __ieee754_hypotf_power7 - : __ieee754_hypotf_ppc32); - -libm_alias_finite (__ieee754_hypotf, __hypotf) From patchwork Fri Dec 3 00:00:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48443 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B7EC7385BF86 for ; Fri, 3 Dec 2021 00:05:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B7EC7385BF86 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489955; bh=RB/ILmpaCJp/59MjK713Nppf75aZt/neu0DpefBMZZo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=CQhTHx1L/tGdxdCAGqQa8Wa6mUMxDSm1DSu4tfj1MEjjJa+dlDfNppYILA3N5KRs1 r/lP516EaMNTIkmACozqa85L5tbV4q/QIzIcUHx6fEn/rpI8G5mVAIWbBFa0AxLQTi aBcl2SMEx/OLFlURCR4BxjXr2svz0n845MZkssU0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by sourceware.org (Postfix) with ESMTPS id 491EC385BF9B for ; Fri, 3 Dec 2021 00:01:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 491EC385BF9B Received: by mail-qv1-xf29.google.com with SMTP id g9so1171452qvd.2 for ; Thu, 02 Dec 2021 16:01:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RB/ILmpaCJp/59MjK713Nppf75aZt/neu0DpefBMZZo=; b=uzl6FSZRhm+vciY/Wezaq3ld6ETCdZYPPHoqwQLZogjq+SIlx/csV9GhKtM7Fyx87J ozTKiombk8r1T3nCvOZOoGHqPK0zWsT01hDtmmYEj7lM7rPJDw5i/fezbQCQLZYYXO4W 9/HymbS579xW3DBj/qL9SES/XY/mp2sBxnm36gckJIrs6IpGYLIcnWYmihN+7WIE5zfw erIE/b7LXp8hC/JKpBNcAb/RlWm24Yz3g6ImLNp6pS233RoNccbKDfj81WWojNL3F1P1 Qv2u3JpnTQ3P1qdegCiU/D2hneEI9HjbeS9XqzQ3TIkA3jbk4daEFmjeR56luJlMCyMq jj3w== X-Gm-Message-State: AOAM530z+8/gNTd2hMvuLlOwYIhWLHlJdK595mQizHwsDpputiTVnMoD P4d9sc+Ir+4yKhjkOLhD8+fgTML7zqvm3w== X-Google-Smtp-Source: ABdhPJyigMA8NCfZ3Kg1fPrLgyI2/LAgAZZOXi4FgB4921DmSysrqz03aZ5LDQxcDB3D9rkNu+cYzQ== X-Received: by 2002:a05:6214:20e3:: with SMTP id 3mr16265653qvk.47.1638489679616; Thu, 02 Dec 2021 16:01:19 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:19 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 07/12] i386: Move hypot implementation to C Date: Thu, 2 Dec 2021 21:00:58 -0300 Message-Id: <20211203000103.737833-8-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu. --- sysdeps/i386/fpu/e_hypot.S | 75 ------------------------------------- sysdeps/i386/fpu/e_hypot.c | 46 +++++++++++++++++++++++ sysdeps/i386/fpu/e_hypotf.S | 64 ------------------------------- 3 files changed, 46 insertions(+), 139 deletions(-) delete mode 100644 sysdeps/i386/fpu/e_hypot.S create mode 100644 sysdeps/i386/fpu/e_hypot.c delete mode 100644 sysdeps/i386/fpu/e_hypotf.S diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S deleted file mode 100644 index f2c956b77a..0000000000 --- a/sysdeps/i386/fpu/e_hypot.S +++ /dev/null @@ -1,75 +0,0 @@ -/* Compute the hypothenuse of X and Y. - Copyright (C) 1998-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include -#include - -DEFINE_DBL_MIN - -#ifdef PIC -# define MO(op) op##@GOTOFF(%edx) -#else -# define MO(op) op -#endif - - .text -ENTRY(__ieee754_hypot) -#ifdef PIC - LOAD_PIC_REG (dx) -#endif - fldl 4(%esp) // x - fxam - fnstsw - fldl 12(%esp) // y : x - movb %ah, %ch - fxam - fnstsw - movb %ah, %al - orb %ch, %ah - sahf - jc 1f - fmul %st(0) // y * y : x - fxch // x : y * y - fmul %st(0) // x * x : y * y - faddp // x * x + y * y - fsqrt - DBL_NARROW_EVAL_UFLOW_NONNEG -2: ret - - // We have to test whether any of the parameters is Inf. - // In this case the result is infinity. -1: andb $0x45, %al - cmpb $5, %al - je 3f // jump if y is Inf - andb $0x45, %ch - cmpb $5, %ch - jne 4f // jump if x is not Inf - fxch -3: fstp %st(1) - fabs - jmp 2b - -4: testb $1, %al - jnz 5f // y is NaN - fxch -5: fstp %st(1) - jmp 2b - -END(__ieee754_hypot) -libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c new file mode 100644 index 0000000000..b7c068e734 --- /dev/null +++ b/sysdeps/i386/fpu/e_hypot.c @@ -0,0 +1,46 @@ +/* Euclidean distance function. Double/Binary64 i386 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +/* The i386 allows to use the default excess of precision to optimize the + hypot implementation, since internal multiplication and sqrt is carried + with 80-bit FP type. */ +double +__ieee754_hypot (double x, double y) +{ + if (!isfinite (x) || !isfinite (y)) + { + if ((isinf (x) || isinf (y)) + && !issignaling (x) && !issignaling (y)) + return INFINITY; + return x + y; + } + + long double lx = x; + long double ly = y; + double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); + math_check_force_underflow_nonneg (r); + return r; +} +libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S deleted file mode 100644 index cec5d15403..0000000000 --- a/sysdeps/i386/fpu/e_hypotf.S +++ /dev/null @@ -1,64 +0,0 @@ -/* Compute the hypothenuse of X and Y. - Copyright (C) 1998-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include -#include - - .text -ENTRY(__ieee754_hypotf) - flds 4(%esp) // x - fxam - fnstsw - flds 8(%esp) // y : x - movb %ah, %ch - fxam - fnstsw - movb %ah, %al - orb %ch, %ah - sahf - jc 1f - fmul %st(0) // y * y : x - fxch // x : y * y - fmul %st(0) // x * x : y * y - faddp // x * x + y * y - fsqrt - FLT_NARROW_EVAL -2: ret - - // We have to test whether any of the parameters is Inf. - // In this case the result is infinity. -1: andb $0x45, %al - cmpb $5, %al - je 3f // jump if y is Inf - andb $0x45, %ch - cmpb $5, %ch - jne 4f // jump if x is not Inf - fxch -3: fstp %st(1) - fabs - jmp 2b - -4: testb $1, %al - jnz 5f // y is NaN - fxch -5: fstp %st(1) - jmp 2b - -END(__ieee754_hypotf) -libm_alias_finite (__ieee754_hypotf, __hypotf) From patchwork Fri Dec 3 00:00:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48445 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4502B385BF9C for ; Fri, 3 Dec 2021 00:07:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4502B385BF9C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638490046; bh=11L5J/Z2b/Z5Vb4Y7hvgLQ+8GR+Dp1zDEPEFV95x9xw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=k3zm64xbcVyb78j444fHTgiG8AbI4JzNHXKgjtynYoFge4MRi5OoL5Zf5gFaKW6QL QveFq4Z9OCHs6Dqxw1ItLBpHgWDjq4fpw2qiXlA5KCugE3trttX+JIEcfYECbAKxpe Pt63muMbIHfAFredZcf/zxXoDCpRo3dPZqbXD6f4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by sourceware.org (Postfix) with ESMTPS id 8CF0A385BF83 for ; Fri, 3 Dec 2021 00:01:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8CF0A385BF83 Received: by mail-qv1-xf30.google.com with SMTP id a24so1152626qvb.5 for ; Thu, 02 Dec 2021 16:01:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=11L5J/Z2b/Z5Vb4Y7hvgLQ+8GR+Dp1zDEPEFV95x9xw=; b=YfJsb5fhsk0pxtWfPPOxmKb7EPmmA0Dr2PejInZSaZfgmNkXEDP9VnF7TiQu1E7TlA /4ECD4PooOk0/JW378BPzCu1NLnHb3eNJJ2TsW8Znd7vLuGbwGkOu5h9BfgPTsISWrTi 7/g6ewCc+3NuPgaiT2DjBmMiWyO6O8IrVsuKo/HQSifucjMuZ6B/NKFYrCuCrRDocQVL K1usMlnt3FkNnA+jEND3lUVvBtBMGR7r9gEY/5gVXFTKDyGAtg9+XmgRfkNFHeEyjakO AUcAo70a/SRDdbzpixbEbKdoR5ZvNeauRQd1SB3kL+iqZlRSYi0eMmd53U7f9UER0P7B gxqg== X-Gm-Message-State: AOAM5334Jrprg3pZA2eiTqb0yVDMtHGGHZ8Q3VXhJisvFo75DYjyWG8s +sjP2wa3Nld7SCOA0a+WURYKJsCOL0g+6Q== X-Google-Smtp-Source: ABdhPJz7Vzd2fGHvp7BDf/acgTx+lKMHaGBYpAWqrmHKh8AG/xqmeqLDU3Texv8vGqD/uQWd7Kb0nQ== X-Received: by 2002:a05:6214:3012:: with SMTP id ke18mr16733117qvb.63.1638489681015; Thu, 02 Dec 2021 16:01:21 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:20 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 08/12] math: Add math-use-builtinds-fmax.h Date: Thu, 2 Dec 2021 21:00:59 -0300 Message-Id: <20211203000103.737833-9-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It allows the architecture to use the builtin instead of generic implementation. --- math/s_fmax_template.c | 27 ++++++++++++++++++++++++ sysdeps/generic/math-use-builtins-fmax.h | 4 ++++ sysdeps/generic/math-use-builtins.h | 1 + 3 files changed, 32 insertions(+) create mode 100644 sysdeps/generic/math-use-builtins-fmax.h diff --git a/math/s_fmax_template.c b/math/s_fmax_template.c index d817406f04..4417fdb283 100644 --- a/math/s_fmax_template.c +++ b/math/s_fmax_template.c @@ -17,10 +17,37 @@ . */ #include +#include + +#if __HAVE_FLOAT128 +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN +# define BUILTIN_F128 , _Float128 :__builtin_fmaxf128 +#else +# define USE_BUILTIN_F128 +# define BUILTIN_F128 +#endif + +#define USE_BUILTIN(X, Y) \ + _Generic((X), \ + float : USE_FMAXF_BUILTIN, \ + double : USE_FMAX_BUILTIN, \ + long double : USE_FMAXL_BUILTIN \ + USE_BUILTIN_F128) + +#define BUILTIN(X, Y) \ + _Generic((X), \ + float : __builtin_fmaxf, \ + double : __builtin_fmax, \ + long double : __builtin_fmaxl \ + BUILTIN_F128) \ + (X, Y) FLOAT M_DECL_FUNC (__fmax) (FLOAT x, FLOAT y) { + if (USE_BUILTIN (x, y)) + return BUILTIN (x, y); + if (isgreaterequal (x, y)) return x; else if (isless (x, y)) diff --git a/sysdeps/generic/math-use-builtins-fmax.h b/sysdeps/generic/math-use-builtins-fmax.h new file mode 100644 index 0000000000..8fc4efca6a --- /dev/null +++ b/sysdeps/generic/math-use-builtins-fmax.h @@ -0,0 +1,4 @@ +#define USE_FMAX_BUILTIN 0 +#define USE_FMAXF_BUILTIN 0 +#define USE_FMAXL_BUILTIN 0 +#define USE_FMAXF128_BUILTIN 0 diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h index 19d2d1cf3c..e07bba242f 100644 --- a/sysdeps/generic/math-use-builtins.h +++ b/sysdeps/generic/math-use-builtins.h @@ -34,5 +34,6 @@ #include #include #include +#include #endif /* MATH_USE_BUILTINS_H */ From patchwork Fri Dec 3 00:01:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48446 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B93EB385BF9C for ; Fri, 3 Dec 2021 00:08:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B93EB385BF9C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638490088; bh=vUw2nrILvigkEZuDyHveY0IdSnRSBlMKLJzoDRtkWDU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=wQ3iMwDMdc2zLuKoKQEnHw1P+xtvoQ8a4dbYGz8f3OUgpG9cxiZnRizWPFRkH8CAV hG97gB4jNh1yBtHW/v83wIA725ZUmilzDkOnOqbGM+dtoR2Jyi+vdrHBTOohtJvGfY jmxZYN72tEvEoUuW0d/4YfuDfFocMcpZnQm+XOQo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by sourceware.org (Postfix) with ESMTPS id E5E2D385C413 for ; Fri, 3 Dec 2021 00:01:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E5E2D385C413 Received: by mail-qk1-x731.google.com with SMTP id b67so1724079qkg.6 for ; Thu, 02 Dec 2021 16:01:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vUw2nrILvigkEZuDyHveY0IdSnRSBlMKLJzoDRtkWDU=; b=IyeQSxQabEE8QHW6yfh447PUXUGzg4XR2d9VF94OqwwuBiFaoGGRjzJW3qfgxbn5jU 4w4UNvs0pGC2J6GUzd5L1Nvhr27Q8P2TUGYAEtnKpdGYEogOhBDBzCao90Ca7gCU+lpB DsI7q0Wuq3UZd09t5JE+o5/pkCq7Z8KpQ1bjWCN6Ct2ic7GzNPb/UQvea1lUGTRdlg1n 9nRetcmZejNcIxT6inIcj7IQoBvIC/sWkKRmRzbYe1DT2VuvHrdYg4Amtf2Yglp633Gt Rzx0UF0ABoyRjiknjpIg0NAt/X7MqUvug2O4xrXShclF/7abqoWgQGKn3qBqd31Av5oG 2xfw== X-Gm-Message-State: AOAM530d/NySN0BrRqr4lJzQ3XgL4tH/kQFOu/psR46hXLpGttWmLNxg rwgudtFbZxr6LMhsnb9ZBF/TeHT90XiHhQ== X-Google-Smtp-Source: ABdhPJwEq/7UHw8xLTNk0fUAcJan3n53Xbz+tNVIZX5JsksVfb3zbQVaKFIKGgAd+mWR8+pDWQk6JA== X-Received: by 2002:a05:620a:bc1:: with SMTP id s1mr14509702qki.49.1638489682409; Thu, 02 Dec 2021 16:01:22 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:22 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 09/12] math: Add math-use-builtinds-fmin.h Date: Thu, 2 Dec 2021 21:01:00 -0300 Message-Id: <20211203000103.737833-10-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It allows the architecture to use the builtin instead of generic implementation. --- math/s_fmin_template.c | 27 ++++++++++++++++++++++++ sysdeps/generic/math-use-builtins-fmin.h | 4 ++++ sysdeps/generic/math-use-builtins.h | 1 + 3 files changed, 32 insertions(+) create mode 100644 sysdeps/generic/math-use-builtins-fmin.h diff --git a/math/s_fmin_template.c b/math/s_fmin_template.c index 565a836266..27c2382f59 100644 --- a/math/s_fmin_template.c +++ b/math/s_fmin_template.c @@ -17,11 +17,38 @@ . */ #include +#include + +#if __HAVE_FLOAT128 +# define USE_BUILTIN_F128 , _Float128 : USE_FMAXF128_BUILTIN +# define BUILTIN_F128 , _Float128 : __builtin_fminf128 +#else +# define USE_BUILTIN_F128 +# define BUILTIN_F128 +#endif + +#define USE_BUILTIN(X, Y) \ + _Generic((X), \ + float : USE_FMAXF_BUILTIN, \ + double : USE_FMAX_BUILTIN, \ + long double : USE_FMAXL_BUILTIN \ + USE_BUILTIN_F128) + +#define BUILTIN(X, Y) \ + _Generic((X), \ + float : __builtin_fminf, \ + double : __builtin_fmin, \ + long double : __builtin_fminl \ + BUILTIN_F128) \ + (X, Y) FLOAT M_DECL_FUNC (__fmin) (FLOAT x, FLOAT y) { + if (USE_BUILTIN (x, y)) + return BUILTIN (x, y); + if (islessequal (x, y)) return x; else if (isgreater (x, y)) diff --git a/sysdeps/generic/math-use-builtins-fmin.h b/sysdeps/generic/math-use-builtins-fmin.h new file mode 100644 index 0000000000..d2383ce00c --- /dev/null +++ b/sysdeps/generic/math-use-builtins-fmin.h @@ -0,0 +1,4 @@ +#define USE_FMIN_BUILTIN 0 +#define USE_FMINF_BUILTIN 0 +#define USE_FMINL_BUILTIN 0 +#define USE_FMINF128_BUILTIN 0 diff --git a/sysdeps/generic/math-use-builtins.h b/sysdeps/generic/math-use-builtins.h index e07bba242f..24fba47575 100644 --- a/sysdeps/generic/math-use-builtins.h +++ b/sysdeps/generic/math-use-builtins.h @@ -35,5 +35,6 @@ #include #include #include +#include #endif /* MATH_USE_BUILTINS_H */ From patchwork Fri Dec 3 00:01:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48447 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E5D473858010 for ; Fri, 3 Dec 2021 00:08:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E5D473858010 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638490135; bh=HQV+ruWsLkqyXdgfSyKVpNK3zquR2dRrIm4jEQSY96Q=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fd6zUKKLd9m/IYlbwFWwFDci1kU78p0qxGI3swWm/QU+Om80eIJSwNN3REfPjkAfJ eiOJX/jqXJXa70/FN0mGu2t1vtWhod7lgFPz+Pifzq199rjsEoSxHFxQZIFJb+Oagn du4L6NlDPFK43jvfA/dpEsd5/Tc1QBSMbRPITrpg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by sourceware.org (Postfix) with ESMTPS id 8E22A385BF86 for ; Fri, 3 Dec 2021 00:01:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8E22A385BF86 Received: by mail-qt1-x836.google.com with SMTP id 8so1578909qtx.5 for ; Thu, 02 Dec 2021 16:01:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HQV+ruWsLkqyXdgfSyKVpNK3zquR2dRrIm4jEQSY96Q=; b=v3eLwIEELtJGTCQHHJDVdPQoI8ejwEKRqutpRFO/L9UYvFD7ZOdE9srAkTMuV9LzMu 4fNkCLQEy32btLHlcBRlU4oRQg0VD6ByvUZhvsoPhMhBGbTk2NMWpmqlVOTFZRSLPgRu jBtyX8yPyYBbRcBND3lDUJOuqLZoYKUBp40ATPwWgQU8WzfWpa1H6vuKF6qd5dqpFuYk EK5SlHHcu7ErgfNgwfNUNqNlHgbtRpyfHMTdXvbq8Yv7z53yPfyELQI2gsbtiWZuVG5n orHdYETk8e2DrCOP4BB5EO8GL02aMbFC8jh8Wc6HJFTbpTI5jyShzAsSwhxvp4SPff9r KJ8A== X-Gm-Message-State: AOAM530TvEdgPwyZ/tKXPfFWykdZN/v3SPTfPligZPJ43o81uIxz9ssw 6YAJVD67NeGFcfSTxnQz2YXJoIWej8mqxQ== X-Google-Smtp-Source: ABdhPJzjtqMm0d08rt4DZSy9nyxRK6PC/0FtttnNaCqZ6iIoph4czgVgOREf/kGcFF03T2xVrYwX5w== X-Received: by 2002:ac8:4e56:: with SMTP id e22mr17694058qtw.72.1638489683923; Thu, 02 Dec 2021 16:01:23 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:23 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 10/12] aarch64: Add math-use-builtins-f{max,min}.h Date: Thu, 2 Dec 2021 21:01:01 -0300 Message-Id: <20211203000103.737833-11-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It allows to remove the arch-specific implementations. --- sysdeps/aarch64/fpu/math-use-builtins-fmax.h | 4 +++ sysdeps/aarch64/fpu/math-use-builtins-fmin.h | 4 +++ sysdeps/aarch64/fpu/s_fmax.c | 28 -------------------- sysdeps/aarch64/fpu/s_fmaxf.c | 28 -------------------- sysdeps/aarch64/fpu/s_fmin.c | 28 -------------------- sysdeps/aarch64/fpu/s_fminf.c | 28 -------------------- 6 files changed, 8 insertions(+), 112 deletions(-) create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmax.h b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h new file mode 100644 index 0000000000..6b9e7b7692 --- /dev/null +++ b/sysdeps/aarch64/fpu/math-use-builtins-fmax.h @@ -0,0 +1,4 @@ +#define USE_FMAX_BUILTIN 1 +#define USE_FMAXF_BUILTIN 10 +#define USE_FMAXL_BUILTIN 0 +#define USE_FMAXF128_BUILTIN 0 diff --git a/sysdeps/aarch64/fpu/math-use-builtins-fmin.h b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h new file mode 100644 index 0000000000..7fd6b45fce --- /dev/null +++ b/sysdeps/aarch64/fpu/math-use-builtins-fmin.h @@ -0,0 +1,4 @@ +#define USE_FMIN_BUILTIN 1 +#define USE_FMINF_BUILTIN 1 +#define USE_FMINL_BUILTIN 0 +#define USE_FMINF128_BUILTIN 0 diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c deleted file mode 100644 index 7e3593fbda..0000000000 --- a/sysdeps/aarch64/fpu/s_fmax.c +++ /dev/null @@ -1,28 +0,0 @@ -/* Copyright (C) 2011-2021 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - -double -__fmax (double x, double y) -{ - return __builtin_fmax (x, y); -} - -libm_alias_double (__fmax, fmax) diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c deleted file mode 100644 index eb4c469ef8..0000000000 --- a/sysdeps/aarch64/fpu/s_fmaxf.c +++ /dev/null @@ -1,28 +0,0 @@ -/* Copyright (C) 2011-2021 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - -float -__fmaxf (float x, float y) -{ - return __builtin_fmaxf (x, y); -} - -libm_alias_float (__fmax, fmax) diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c deleted file mode 100644 index efdee1ed39..0000000000 --- a/sysdeps/aarch64/fpu/s_fmin.c +++ /dev/null @@ -1,28 +0,0 @@ -/* Copyright (C) 1996-2021 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - -double -__fmin (double x, double y) -{ - return __builtin_fmin (x, y); -} - -libm_alias_double (__fmin, fmin) diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c deleted file mode 100644 index 3665fabb6b..0000000000 --- a/sysdeps/aarch64/fpu/s_fminf.c +++ /dev/null @@ -1,28 +0,0 @@ -/* Copyright (C) 2011-2021 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - -float -__fminf (float x, float y) -{ - return __builtin_fminf (x, y); -} - -libm_alias_float (__fmin, fmin) From patchwork Fri Dec 3 00:01:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48448 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 21860385BF9B for ; Fri, 3 Dec 2021 00:09:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 21860385BF9B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638490178; bh=GtntpGtdqgHfX0Fmy9BZYfmITV+lNia4vCnvI35/sfA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TrdApB51tuJquikda20vgHVT2sbFCc5cBBDdn6rO+yATD3N8qjpeHUn6hdmWob9WI w/l2JzINdCFgKQ/BzqRMZ08UBsPyRGO9W47ihLcxIdIF6HerCemgSKJG0zmeazUCfE d1FrVU0aAqsCyv7IF8p/GoN6AXpeeE4hjbdCffLM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id D828F385C41A for ; Fri, 3 Dec 2021 00:01:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D828F385C41A Received: by mail-qt1-x82e.google.com with SMTP id p19so1533723qtw.12 for ; Thu, 02 Dec 2021 16:01:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GtntpGtdqgHfX0Fmy9BZYfmITV+lNia4vCnvI35/sfA=; b=8Ir+PFuLPj7BnNhpD3muBMSgeoZST9i+WYTIHeyc6xl2EGn1c0RzmCEwHgbe7L3Mqn 2D6wPZiPcLshPH08bWzo6RdQU0krgUNjeuLHvXvaDvwstPo0AOTnz4baOlbu5DdYYcIe HfFGl1gGMtB1sEFlUofMVAN6vB1zCGhEOwwk8vipHNCeG89+0+tFlr0CLJ1LIIMmkWUx FZ5I/9lN7PrZB//VqA9BTJcs8cPRjw5hrjL4T4NIWrYkLAxLvHj6aMEyUDScsDdfgs4q uCYyCElowgS8gYMovZFAHIx6WH0i4d62whUz8yPopT3/+OXEtj3Rij6Y2+w8aUhCKdfY gbFg== X-Gm-Message-State: AOAM5323kZUCOGiZx0S3y4sIKx/ttu40FWppI2mTpEQvMik+y67QSFx7 RfKCWfs6Hw6Zk6iDHF516Yv9I8A504hUxw== X-Google-Smtp-Source: ABdhPJztx1R+IRYyg8Js+IKsjcOM5H742HVPCRbt6WAcngal+wfzwU/tJbrZoze0X9tqOoVUSdd0Ww== X-Received: by 2002:a05:622a:30b:: with SMTP id q11mr17433586qtw.348.1638489685316; Thu, 02 Dec 2021 16:01:25 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:25 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 11/12] math: Use fmin/fmax on hypot Date: Thu, 2 Dec 2021 21:01:02 -0300 Message-Id: <20211203000103.737833-12-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From: Wilco Dijkstra It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu. --- sysdeps/ieee754/dbl-64/e_hypot.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index f53061badc..ce51784d27 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include "math_config.h" @@ -95,8 +96,8 @@ __ieee754_hypot (double x, double y) x = fabs (x); y = fabs (y); - double ax = x < y ? y : x; - double ay = x < y ? x : y; + double ax = USE_FMAX_BUILTIN ? fmax (x, y) : (x < y ? y : x); + double ay = USE_FMIN_BUILTIN ? fmin (x, y) : (x < y ? x : y); /* If ax is huge, scale both inputs down. */ if (__glibc_unlikely (ax > LARGE_VAL)) From patchwork Fri Dec 3 00:01:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48449 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C20D93857810 for ; Fri, 3 Dec 2021 00:10:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C20D93857810 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638490220; bh=Kk8tjXd97eS6fge9ijG4k11I1iyly41lMXtd9Xx0k50=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=YjJfK0j4amdUQNHrk3S/CMipwBNh9v9bA9J3SdC377q8yNuMcyjnXjGSc398f0Zhz Li3qVOfk8K0MK5/Q7XBxpNzbbTXf2AI0hCyIHUG3lAkpONxBhJJpBs/cG75aCYrI0N s/0AbaxjUlwPov9gVYLc6Co2ENxQ0j8GLsQ2JmGQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id C0F85385780D for ; Fri, 3 Dec 2021 00:01:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C0F85385780D Received: by mail-qt1-x833.google.com with SMTP id j17so1596258qtx.2 for ; Thu, 02 Dec 2021 16:01:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Kk8tjXd97eS6fge9ijG4k11I1iyly41lMXtd9Xx0k50=; b=oq4mvgvgfWpiJBTcLOdjdy1eH9QBYHiZozrixcrMvHnpk0ju8Suno18dMwBR1i28JH sbidPlhNgbZFwjBsUzTREsxT3O8EIZfJZ6zs2T1P7/7gZL34fY5ONZLUnyr3ckpDNY73 RSH9VOVGtqWIr4JGM3kJqaAtGvCkqJUe5IDik1iFek4Tco3/N8tso3I402PdVAf1Rlwj 1Rmji/jqB5LuE8Pd+4AQwam43M1/q+mtpT0XfKl459kBOXeDYewbTHJa9mYOuoxUZ2CF /lQJwwfSPCBfX4maxf6wDQWtB2kBpwZpW73Kz6Ym3Cpbvh0V5YP/ItWYAqZ+JTUUcguf Tubw== X-Gm-Message-State: AOAM5303j4k8gYGij+HG7U3pQkseYcvclQii2Xs0jD3jy7HzM8Tjd30o gp08dO2bLfq50eeaG1lU/7vyf7v1Tpjk5w== X-Google-Smtp-Source: ABdhPJzWip5C7NM+fHd4dD6qrj4X/XQJ7DuB60FXY8L9mYhQ7RhZvwZe/GboEWLiwcS/2dIx10t9sw== X-Received: by 2002:a05:622a:1828:: with SMTP id t40mr17676210qtc.0.1638489686852; Thu, 02 Dec 2021 16:01:26 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:26 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 12/12] math: Remove the error handling wrapper from hypot and hypotf Date: Thu, 2 Dec 2021 21:01:03 -0300 Message-Id: <20211203000103.737833-13-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211203000103.737833-1-adhemerval.zanella@linaro.org> References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. --- math/Versions | 2 ++ math/w_hypot.c | 8 ++++++ math/w_hypot_compat.c | 13 +++++++--- math/w_hypotf.c | 8 ++++++ math/w_hypotf_compat.c | 6 ++--- sysdeps/i386/fpu/e_hypot.c | 14 +++++++++- sysdeps/ieee754/dbl-64/e_hypot.c | 26 +++++++++++++++---- sysdeps/ieee754/dbl-64/w_hypot.c | 1 + sysdeps/ieee754/flt-32/e_hypotf.c | 21 +++++++++++---- sysdeps/ieee754/flt-32/w_hypotf.c | 1 + sysdeps/mach/hurd/i386/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/i386/libm.abilist | 2 ++ .../sysv/linux/m68k/coldfire/libm.abilist | 2 ++ .../unix/sysv/linux/m68k/m680x0/libm.abilist | 2 ++ .../sysv/linux/microblaze/be/libm.abilist | 2 ++ .../sysv/linux/microblaze/le/libm.abilist | 2 ++ .../unix/sysv/linux/mips/mips32/libm.abilist | 2 ++ .../unix/sysv/linux/mips/mips64/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 ++ .../linux/powerpc/powerpc32/fpu/libm.abilist | 2 ++ .../powerpc/powerpc32/nofpu/libm.abilist | 2 ++ .../linux/powerpc/powerpc64/be/libm.abilist | 2 ++ .../linux/powerpc/powerpc64/le/libm.abilist | 2 ++ .../unix/sysv/linux/s390/s390-32/libm.abilist | 2 ++ .../unix/sysv/linux/s390/s390-64/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 ++ sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 ++ .../sysv/linux/sparc/sparc32/libm.abilist | 2 ++ .../sysv/linux/sparc/sparc64/libm.abilist | 2 ++ .../unix/sysv/linux/x86_64/64/libm.abilist | 2 ++ .../unix/sysv/linux/x86_64/x32/libm.abilist | 2 ++ 36 files changed, 135 insertions(+), 17 deletions(-) create mode 100644 math/w_hypot.c create mode 100644 math/w_hypotf.c create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c diff --git a/math/Versions b/math/Versions index a4b5405ddc..fef7a08c3b 100644 --- a/math/Versions +++ b/math/Versions @@ -628,5 +628,7 @@ libm { fminimum_numf64x; fminimum_numf128; fminimum_magf64x; fminimum_magf128; fminimum_mag_numf64x; fminimum_mag_numf128; + # No SVID compatible error handling. + hypotf; hypot; } } diff --git a/math/w_hypot.c b/math/w_hypot.c new file mode 100644 index 0000000000..66f723a896 --- /dev/null +++ b/math/w_hypot.c @@ -0,0 +1,8 @@ +#include +#undef __USE_WRAPPER_TEMPLATE +#define __USE_WRAPPER_TEMPLATE 1 +#undef declare_mgen_alias +#define declare_mgen_alias(a, b) +#include +versioned_symbol (libm, __hypot, hypot, GLIBC_2_35); +libm_alias_float_other (__hypot, hypot) diff --git a/math/w_hypot_compat.c b/math/w_hypot_compat.c index f07039cc51..ec983a4ab8 100644 --- a/math/w_hypot_compat.c +++ b/math/w_hypot_compat.c @@ -20,9 +20,9 @@ #include -#if LIBM_SVID_COMPAT +#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35) double -__hypot (double x, double y) +__hypot_compat (double x, double y) { double z = __ieee754_hypot(x,y); if(__builtin_expect(!isfinite(z), 0) @@ -31,5 +31,12 @@ __hypot (double x, double y) return z; } -libm_alias_double (__hypot, hypot) +compat_symbol (libm, __hypot_compat, hypot, GLIBC_2_0); +# ifdef NO_LONG_DOUBLE +weak_alias (__hypot_compat, hypotl) +# endif +# ifdef LONG_DOUBLE_COMPAT +LONG_DOUBLE_COMPAT_CHOOSE_libm_hypotl ( + compat_symbol (libm, __hypot_compat, hypotl, FIRST_VERSION_libm_hypotl), ); +# endif #endif diff --git a/math/w_hypotf.c b/math/w_hypotf.c new file mode 100644 index 0000000000..b15a9b06d0 --- /dev/null +++ b/math/w_hypotf.c @@ -0,0 +1,8 @@ +#include +#undef __USE_WRAPPER_TEMPLATE +#define __USE_WRAPPER_TEMPLATE 1 +#undef declare_mgen_alias +#define declare_mgen_alias(a, b) +#include +versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35); +libm_alias_float_other (__hypotf, hypotf) diff --git a/math/w_hypotf_compat.c b/math/w_hypotf_compat.c index 05898d3420..2bde4553b0 100644 --- a/math/w_hypotf_compat.c +++ b/math/w_hypotf_compat.c @@ -22,9 +22,9 @@ #include -#if LIBM_SVID_COMPAT +#if LIBM_SVID_COMPAT && SHLIB_COMPAT (libm, GLIBC_2_0, GLIBC_2_35) float -__hypotf(float x, float y) +__hypotf_compat (float x, float y) { float z = __ieee754_hypotf(x,y); if(__builtin_expect(!isfinite(z), 0) @@ -34,5 +34,5 @@ __hypotf(float x, float y) return z; } -libm_alias_float (__hypot, hypot) +compat_symbol (libm, __hypotf_compat, hypotf, GLIBC_2_0); #endif diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c index b7c068e734..a0c5734b68 100644 --- a/sysdeps/i386/fpu/e_hypot.c +++ b/sysdeps/i386/fpu/e_hypot.c @@ -20,14 +20,17 @@ #include #include #include +#include #include +#include #include +#include /* The i386 allows to use the default excess of precision to optimize the hypot implementation, since internal multiplication and sqrt is carried with 80-bit FP type. */ double -__ieee754_hypot (double x, double y) +__hypot (double x, double y) { if (!isfinite (x) || !isfinite (y)) { @@ -41,6 +44,15 @@ __ieee754_hypot (double x, double y) long double ly = y; double r = math_narrow_eval (sqrtl (lx * lx + ly * ly)); math_check_force_underflow_nonneg (r); + if (isinf (r)) + __set_errno (ERANGE); return r; } +strong_alias (__hypot, __ieee754_hypot) +#if LIBM_SVID_COMPAT +versioned_symbol (libm, __hypot, hypot, GLIBC_2_35); libm_alias_finite (__ieee754_hypot, __hypot) +libm_alias_double_other (__hypot, hypot) +#else +libm_alias_double (__hypot, hypot) +#endif diff --git a/sysdeps/ieee754/dbl-64/e_hypot.c b/sysdeps/ieee754/dbl-64/e_hypot.c index ce51784d27..54f936c82b 100644 --- a/sysdeps/ieee754/dbl-64/e_hypot.c +++ b/sysdeps/ieee754/dbl-64/e_hypot.c @@ -34,12 +34,15 @@ [1] https://arxiv.org/pdf/1904.09481.pdf */ +#include #include #include #include #include #include +#include #include +#include #include "math_config.h" #define SCALE 0x1p-600 @@ -47,6 +50,14 @@ #define TINY_VAL 0x1p-459 #define EPS 0x1p-54 +static inline double +handle_errno (double r) +{ + if (isinf (r)) + __set_errno (ERANGE); + return r; +} + /* Hypot kernel. The inputs must be adjusted so that ax >= ay >= 0 and squaring ax, ay and (ax - ay) does not overflow or underflow. */ static inline double @@ -83,7 +94,7 @@ kernel (double ax, double ay) } double -__ieee754_hypot (double x, double y) +__hypot (double x, double y) { if (!isfinite(x) || !isfinite(y)) { @@ -103,9 +114,9 @@ __ieee754_hypot (double x, double y) if (__glibc_unlikely (ax > LARGE_VAL)) { if (__glibc_unlikely (ay <= ax * EPS)) - return ax + ay; - - return kernel (ax * SCALE, ay * SCALE) / SCALE; + return handle_errno (ax + ay); + + return handle_errno (kernel (ax * SCALE, ay * SCALE) / SCALE); } /* If ay is tiny, scale both inputs up. */ @@ -125,6 +136,11 @@ __ieee754_hypot (double x, double y) return kernel (ax, ay); } -#ifndef __ieee754_hypot +strong_alias (__hypot, __ieee754_hypot) libm_alias_finite (__ieee754_hypot, __hypot) +#if LIBM_SVID_COMPAT +versioned_symbol (libm, __hypot, hypot, GLIBC_2_35); +libm_alias_double_other (__hypot, hypot) +#else +libm_alias_double (__hypot, hypot) #endif diff --git a/sysdeps/ieee754/dbl-64/w_hypot.c b/sysdeps/ieee754/dbl-64/w_hypot.c new file mode 100644 index 0000000000..1cc8931700 --- /dev/null +++ b/sysdeps/ieee754/dbl-64/w_hypot.c @@ -0,0 +1 @@ +/* Not needed. */ diff --git a/sysdeps/ieee754/flt-32/e_hypotf.c b/sysdeps/ieee754/flt-32/e_hypotf.c index 1d082fe36c..809c00c11e 100644 --- a/sysdeps/ieee754/flt-32/e_hypotf.c +++ b/sysdeps/ieee754/flt-32/e_hypotf.c @@ -16,14 +16,17 @@ License along with the GNU C Library; if not, see . */ +#include #include +#include +#include #include #include "math_config.h" #include #include float -__ieee754_hypotf (float x, float y) +__hypotf (float x, float y) { if (!isfinite(x) || !isfinite(y)) { @@ -33,9 +36,17 @@ __ieee754_hypotf (float x, float y) return x + y; } - return math_narrow_eval (sqrt ((double) x * (double) x - + (double) y * (double) y)); + float r = math_narrow_eval (sqrt ((double) x * (double) x + + (double) y * (double) y)); + if (!isfinite (r)) + __set_errno (ERANGE); + return r; } -#ifndef __ieee754_hypotf -libm_alias_finite (__ieee754_hypotf, __hypotf) +strong_alias (__hypotf, __ieee754_hypotf) +#if LIBM_SVID_COMPAT +versioned_symbol (libm, __hypotf, hypotf, GLIBC_2_35); +libm_alias_float_other (__hypot, hypot) +#else +libm_alias_float (__hypot, hypot) #endif +libm_alias_finite (__ieee754_hypotf, __hypotf) diff --git a/sysdeps/ieee754/flt-32/w_hypotf.c b/sysdeps/ieee754/flt-32/w_hypotf.c new file mode 100644 index 0000000000..1cc8931700 --- /dev/null +++ b/sysdeps/ieee754/flt-32/w_hypotf.c @@ -0,0 +1 @@ +/* Not needed. */ diff --git a/sysdeps/mach/hurd/i386/libm.abilist b/sysdeps/mach/hurd/i386/libm.abilist index abf91bd142..8f40ddb150 100644 --- a/sysdeps/mach/hurd/i386/libm.abilist +++ b/sysdeps/mach/hurd/i386/libm.abilist @@ -1179,3 +1179,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/aarch64/libm.abilist b/sysdeps/unix/sysv/linux/aarch64/libm.abilist index 1cef7d3db7..c2e3c6453e 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libm.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libm.abilist @@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/alpha/libm.abilist b/sysdeps/unix/sysv/linux/alpha/libm.abilist index 59d51021fa..4f85b6180f 100644 --- a/sysdeps/unix/sysv/linux/alpha/libm.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libm.abilist @@ -1201,6 +1201,8 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/arm/be/libm.abilist b/sysdeps/unix/sysv/linux/arm/be/libm.abilist index 44666ad7cd..36190add84 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libm.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libm.abilist @@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 _LIB_VERSION D 0x4 GLIBC_2.4 __clog10 F GLIBC_2.4 __clog10f F diff --git a/sysdeps/unix/sysv/linux/arm/le/libm.abilist b/sysdeps/unix/sysv/linux/arm/le/libm.abilist index 44666ad7cd..36190add84 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libm.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libm.abilist @@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 _LIB_VERSION D 0x4 GLIBC_2.4 __clog10 F GLIBC_2.4 __clog10f F diff --git a/sysdeps/unix/sysv/linux/hppa/libm.abilist b/sysdeps/unix/sysv/linux/hppa/libm.abilist index 35d316a720..b5dd4e851f 100644 --- a/sysdeps/unix/sysv/linux/hppa/libm.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libm.abilist @@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 exp2l F diff --git a/sysdeps/unix/sysv/linux/i386/libm.abilist b/sysdeps/unix/sysv/linux/i386/libm.abilist index ef99b3e104..5d89aaa08e 100644 --- a/sysdeps/unix/sysv/linux/i386/libm.abilist +++ b/sysdeps/unix/sysv/linux/i386/libm.abilist @@ -1186,3 +1186,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist index 44666ad7cd..36190add84 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist @@ -531,6 +531,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 _LIB_VERSION D 0x4 GLIBC_2.4 __clog10 F GLIBC_2.4 __clog10f F diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist index 58316c96ae..e7cd739a54 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist @@ -882,3 +882,5 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist index b5e5da0272..274ecff630 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libm.abilist @@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist index b5e5da0272..274ecff630 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libm.abilist @@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist index 4113d3170d..08b902118d 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/libm.abilist @@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 exp2l F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist index 18fe9cc57a..09bb3bd75b 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/libm.abilist @@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/nios2/libm.abilist b/sysdeps/unix/sysv/linux/nios2/libm.abilist index 3a2b34ecc2..11abbb5668 100644 --- a/sysdeps/unix/sysv/linux/nios2/libm.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libm.abilist @@ -843,3 +843,5 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist index 740cc8f55b..1688809c36 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist @@ -888,6 +888,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist index 16fb30566b..e880cebd78 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist @@ -887,6 +887,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist index ad4b98c09a..033385dfc1 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist @@ -881,6 +881,8 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist index 955765051c..7923d428bc 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist @@ -1316,3 +1316,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist index 1f5bd7754d..9a84163089 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist @@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist index 0b18481f39..174bde4fa0 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist @@ -1145,6 +1145,8 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/sh/be/libm.abilist b/sysdeps/unix/sysv/linux/sh/be/libm.abilist index f525a9e77e..1e1324d667 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libm.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libm.abilist @@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 exp2l F diff --git a/sysdeps/unix/sysv/linux/sh/le/libm.abilist b/sysdeps/unix/sysv/linux/sh/le/libm.abilist index f525a9e77e..1e1324d667 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libm.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libm.abilist @@ -842,4 +842,6 @@ GLIBC_2.35 fminimumf64 F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 exp2l F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist index 727d1ce707..217e6eff7f 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist @@ -1152,6 +1152,8 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F GLIBC_2.4 __clog10l F GLIBC_2.4 __finitel F GLIBC_2.4 __fpclassifyl F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist index 0f57574523..6b53b0c59f 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist @@ -1144,3 +1144,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist index 574789e061..dbefbc3a1a 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist @@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist index 1dc89b304d..8001d0f219 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist @@ -1177,3 +1177,5 @@ GLIBC_2.35 fminimumf64x F GLIBC_2.35 fminimuml F GLIBC_2.35 fsqrt F GLIBC_2.35 fsqrtl F +GLIBC_2.35 hypot F +GLIBC_2.35 hypotf F