From patchwork Fri Dec 3 00:00:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 48437 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AE42F3857831 for ; Fri, 3 Dec 2021 00:01:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AE42F3857831 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638489692; bh=TZ/y7+W0dmqos1TXttdm91gYv1KGxTzLukwzqdSq5+o=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=K5Nnsh9cV2A4R8oKzWGq4FI3+Q5Ykn18L/iWciWTtJdr3jGlwk0zqU4rOE1aiDWJg ChKhC+g//I9hJ6iv1LwvI1HalGv5gW5lMs4c949Zn7gNv6UcTC2RgwdcIQq1Vohz8D ERhIXF5egtmUehPxe19RaP74a3OM91EJaEbdNH8g= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by sourceware.org (Postfix) with ESMTPS id 804123858428 for ; Fri, 3 Dec 2021 00:01:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 804123858428 Received: by mail-qk1-x733.google.com with SMTP id 193so1698292qkh.10 for ; Thu, 02 Dec 2021 16:01:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=TZ/y7+W0dmqos1TXttdm91gYv1KGxTzLukwzqdSq5+o=; b=wNz11Ibz/GggHV47brR9oPWuHXBqYDI0/RyI1r+lTSTATT1FJiNHtOr8oLb8ZGkzdG 5DHrZYxuMo/AiSyOtAi4AqRwOTgDrBesccsOm98kb7XW+Ybd8HRz8jZVjRZb0Rd3oDeE o/xge+N6PhDFgyUrOAZxbEiG9nFAhn9G5S694PzuYB1h4c+MvDl0+Eqk2eOOdPumRP19 DA09kTROwlOMLBdvsDVI7SsjkCNJtPewuVqEbpWQMIKXHbDJMUrwOQkAZk95BsY4zG0c MkQUudXWDKATu5u1WEgqoHvuy5rWr3fiP61dDDF2entGzYeoycCHIgB1c4yS+fiGLEPA NWhg== X-Gm-Message-State: AOAM530qY9Tlar8B/Gcgza9AAqFLdIxpJl5SfvrLI+F703l7jsZNloZw cH8ixsvfx2KbHIqFExH3297ygLbbWRkU6A== X-Google-Smtp-Source: ABdhPJxRegdSPo7ZNNsKSazDbnlqegUc6tAj4Ymd3gnc6DozKvWk4DdEko1cQifFkT2XA53kGlCqIA== X-Received: by 2002:a05:620a:1986:: with SMTP id bm6mr14899774qkb.217.1638489666630; Thu, 02 Dec 2021 16:01:06 -0800 (PST) Received: from birita.. ([2804:431:c7cb:30f8:3030:59d3:d31c:ed39]) by smtp.gmail.com with ESMTPSA id m9sm938714qkn.59.2021.12.02.16.01.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 16:01:06 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v4 00/12] Improve hypot Date: Thu, 2 Dec 2021 21:00:51 -0300 Message-Id: <20211203000103.737833-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patchset add a different algorithm along with Wilco [1] performance improvements. The default implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and improvements (although the dbl-64 one uses a slight different approach when fast FMA is avaliable). This method is also used by Julia language runtime [2]. The motivation for this change are: 1. Use a newer algotihm that favor FP operations over interger ones (which tends to show better performance specially on newer processor with multiple FP units). It also allows consolidate the multiple implementation (for instance, the powerpc one). 2. The new algorithm is more precise without minimum performance difference. The current hypot() implementation seems already to be bounded to a maximum of 1 ulp of error, however the new proposed algorithm shows an slight precision improvement by showing more correctly rounded results. With a random 1e9 inputs for different float format I see: - An improvement from 3427362 to 18457 results with 1 ulp of error for Binary64. - An improvement from 233442 to 1274 results with 1 ulp of error for Binary96 (x86_64). - An improvement from 453045 to 1294 results with 1 ulp of error for Binary96 (x86_64). Also for the maximal known error master shows (in ulps, with corresponding inputs), determined with [3]: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 binary64 0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022 binary96 0.981 0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l binary128 0.985 -0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384 With the new implementation: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same] binary64 0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022 binary96 0.584 -0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l binary128 0.749 0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192 If FMA is uses the binary64 shows a slight worse precision: I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not required since it calls the dbl-64 one. I have not adapated ldbl-128ibm since the format has a lot of caveats and IBM aims to move to ldbl-128. [1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html [1] https://arxiv.org/pdf/1904.09481.pdf [2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c [3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c Adhemerval Zanella (9): math: Simplify hypotf implementation math: Use an improved algorithm for hypotl (ldbl-96) math: Use an improved algorithm for hypotl (ldbl-128) math: Remove powerpc e_hypot i386: Move hypot implementation to C math: Add math-use-builtinds-fmax.h math: Add math-use-builtinds-fmin.h aarch64: Add math-use-builtins-f{max,min}.h math: Remove the error handling wrapper from hypot and hypotf Wilco Dijkstra (3): math: Use an improved algorithm for hypot (dbl-64) math: Improve hypot performance with FMA math: Use fmin/fmax on hypot math/Versions | 2 + math/s_fmax_template.c | 27 ++ math/s_fmin_template.c | 27 ++ math/w_hypot.c | 8 + math/w_hypot_compat.c | 13 +- math/w_hypotf.c | 8 + math/w_hypotf_compat.c | 6 +- sysdeps/aarch64/fpu/math-use-builtins-fmax.h | 4 + sysdeps/aarch64/fpu/math-use-builtins-fmin.h | 4 + sysdeps/aarch64/fpu/s_fmax.c | 28 -- sysdeps/aarch64/fpu/s_fmaxf.c | 28 -- sysdeps/aarch64/fpu/s_fmin.c | 28 -- sysdeps/aarch64/fpu/s_fminf.c | 28 -- sysdeps/generic/math-use-builtins-fmax.h | 4 + sysdeps/generic/math-use-builtins-fmin.h | 4 + sysdeps/generic/math-use-builtins.h | 2 + sysdeps/i386/fpu/e_hypot.S | 75 ----- sysdeps/i386/fpu/e_hypot.c | 58 ++++ sysdeps/i386/fpu/e_hypotf.S | 64 ----- sysdeps/ieee754/dbl-64/e_hypot.c | 269 ++++++++---------- sysdeps/ieee754/dbl-64/w_hypot.c | 1 + sysdeps/ieee754/flt-32/e_hypotf.c | 79 ++--- sysdeps/ieee754/flt-32/math_config.h | 9 + sysdeps/ieee754/flt-32/w_hypotf.c | 1 + sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 +++++++-------- sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 +++++++-------- sysdeps/mach/hurd/i386/libm.abilist | 2 + sysdeps/powerpc/fpu/e_hypot.c | 87 ------ sysdeps/powerpc/fpu/e_hypotf.c | 78 ----- .../powerpc32/power4/fpu/multiarch/Makefile | 5 +- .../power4/fpu/multiarch/e_hypot-power7.c | 23 -- .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 -- .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 --- .../power4/fpu/multiarch/e_hypotf-power7.c | 23 -- .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 -- .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 --- sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 + sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 + sysdeps/unix/sysv/linux/i386/libm.abilist | 2 + .../sysv/linux/m68k/coldfire/libm.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libm.abilist | 2 + .../sysv/linux/microblaze/be/libm.abilist | 2 + .../sysv/linux/microblaze/le/libm.abilist | 2 + .../unix/sysv/linux/mips/mips32/libm.abilist | 2 + .../unix/sysv/linux/mips/mips64/libm.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 + .../linux/powerpc/powerpc32/fpu/libm.abilist | 2 + .../powerpc/powerpc32/nofpu/libm.abilist | 2 + .../linux/powerpc/powerpc64/be/libm.abilist | 2 + .../linux/powerpc/powerpc64/le/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-32/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 + .../sysv/linux/sparc/sparc32/libm.abilist | 2 + .../sysv/linux/sparc/sparc64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libm.abilist | 2 + 61 files changed, 586 insertions(+), 1028 deletions(-) create mode 100644 math/w_hypot.c create mode 100644 math/w_hypotf.c create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c create mode 100644 sysdeps/generic/math-use-builtins-fmax.h create mode 100644 sysdeps/generic/math-use-builtins-fmin.h delete mode 100644 sysdeps/i386/fpu/e_hypot.S create mode 100644 sysdeps/i386/fpu/e_hypot.c delete mode 100644 sysdeps/i386/fpu/e_hypotf.S create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c