From patchwork Tue Dec 7 19:03:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 48595 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD4BF3858439 for ; Tue, 7 Dec 2021 19:04:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD4BF3858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1638903860; bh=BczRdudcKBp7fdqU3oELz+nqMtE6o6AjrzrpCa+JJKQ=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Q4Cd+ma9rQrMsgsa1+SXq7QIkef4t9h4VNQ9sIY/yEubVDbhraQsHA4Jtkf3gzL10 9Mics+HePK0XTb4nS11YnU0QjANG4UgAsesfMHfxdD2Bhz42bYbxSGtXphv0MUpNvl Y7NQSaT9OQsP94unwJW4tVNhh5SRUmg40pjEBeqI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ua1-x929.google.com (mail-ua1-x929.google.com [IPv6:2607:f8b0:4864:20::929]) by sourceware.org (Postfix) with ESMTPS id 9CFBC3858D28 for ; Tue, 7 Dec 2021 19:03:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CFBC3858D28 Received: by mail-ua1-x929.google.com with SMTP id o1so337805uap.4 for ; Tue, 07 Dec 2021 11:03:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=BczRdudcKBp7fdqU3oELz+nqMtE6o6AjrzrpCa+JJKQ=; b=3Qq1hleQUUTlp0LfMZZYTYp6O0feJO7IXZSj+1arjtTMmbYhVzQFvUNgEYepg7tdjS 9YTRVNbeXboW9zKFAl+o0QrBI2+doKKAbfc/yZHhGbMA0EyyZosIYCBvgdqsYXMu+QDz gNrXytAAOb1EqICpj1abPNFFPLpuSBTUvSJanW9WQuL/3tDd9hKGgu5MfwzIlv7kdJak jnoHHnPJ7X/K4ZvVzYay5VYCardoxP3YCtAcFAl2uGx68ORyokB3xXxJVRf0Yg1qyGhG JS6dNFZIm3p0wuSeNTWlXSbSBbAHNzH7yOm1ch4q/sZVSNSAFNQWl/3p8NlzKT5AWT5r Am9w== X-Gm-Message-State: AOAM530HOUTAUq/RiW62iWCZjL1sQT9EsZALaYedREh5BCvPcaloIjub Ka0yaJpF3Dd+zaeVDNgFf1yHFdgeXUBX+w== X-Google-Smtp-Source: ABdhPJyA2C6PSthC3lqhAuNx5VHjvMtetp+Ajq8/3GE3yaM/3mpOp877HuYSzVcODlG/1EVH/hK4+g== X-Received: by 2002:a67:d583:: with SMTP id m3mr46291587vsj.16.1638903837834; Tue, 07 Dec 2021 11:03:57 -0800 (PST) Received: from birita.. ([2804:431:c7ca:a776:246c:70fd:1377:eec7]) by smtp.gmail.com with ESMTPSA id l190sm312904vsc.26.2021.12.07.11.03.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Dec 2021 11:03:57 -0800 (PST) To: libc-alpha@sourceware.org, Paul Zimmermann , Wilco Dijkstra Subject: [PATCH v5 00/12] Improve hypot Date: Tue, 7 Dec 2021 16:03:41 -0300 Message-Id: <20211207190353.3282666-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patchset add a different algorithm along with Wilco [1] performance improvements. The default implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and improvements (although the dbl-64 one uses a slight different approach when fast FMA is avaliable). This method is also used by Julia language runtime [2]. The motivation for this change are: 1. Use a newer algotihm that favor FP operations over interger ones (which tends to show better performance specially on newer processor with multiple FP units). It also allows consolidate the multiple implementation (for instance, the powerpc one). 2. The new algorithm is more precise without minimum performance difference. The current hypot() implementation seems already to be bounded to a maximum of 1 ulp of error, however the new proposed algorithm shows an slight precision improvement by showing more correctly rounded results. With a random 1e9 inputs for different float format I see: - An improvement from 3427362 to 18457 results with 1 ulp of error for Binary64. - An improvement from 233442 to 1274 results with 1 ulp of error for Binary96 (x86_64). - An improvement from 453045 to 1294 results with 1 ulp of error for Binary96 (x86_64). Also for the maximal known error master shows (in ulps, with corresponding inputs), determined with [3]: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 binary64 0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022 binary96 0.981 0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l binary128 0.985 -0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384 With the new implementation: binary32 0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same] binary64 0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022 binary96 0.584 -0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l binary128 0.749 0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192 If FMA is uses the binary64 shows a slight worse precision: I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not required since it calls the dbl-64 one. I have not adapated ldbl-128ibm since the format has a lot of caveats and IBM aims to move to ldbl-128. [1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html [1] https://arxiv.org/pdf/1904.09481.pdf [2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c [3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c --- Changes from v5: * Add missing cast on generic hypotf. * Add missing math_narrow_eval on generic hypot. * Fixed return value on generic ldbl-96 hypotl. * Fixed return value on generic ldbl-128 hypotl. * Add POWER10 performance values. * Add missing cast on i686 hypotf. * Rewrite math-use-builtinds-fmax.h and math-use-builtinds-fmin.h based on Joseph's suggestions. --- Adhemerval Zanella (9): math: Simplify hypotf implementation math: Use an improved algorithm for hypotl (ldbl-96) math: Use an improved algorithm for hypotl (ldbl-128) i386: Move hypot implementation to C math: Remove powerpc e_hypot math: Add math-use-builtinds-fmax.h math: Add math-use-builtinds-fmin.h aarch64: Add math-use-builtins-f{max,min}.h math: Remove the error handling wrapper from hypot and hypotf Wilco Dijkstra (3): math: Use an improved algorithm for hypot (dbl-64) math: Improve hypot performance with FMA math: Use fmin/fmax on hypot math/Versions | 2 + math/s_fmax_template.c | 5 + math/s_fmin_template.c | 6 +- math/w_hypot.c | 8 + math/w_hypot_compat.c | 13 +- math/w_hypotf.c | 8 + math/w_hypotf_compat.c | 6 +- sysdeps/aarch64/fpu/math-use-builtins-fmax.h | 4 + sysdeps/aarch64/fpu/math-use-builtins-fmin.h | 4 + sysdeps/aarch64/fpu/s_fmax.c | 28 -- sysdeps/aarch64/fpu/s_fmaxf.c | 28 -- sysdeps/aarch64/fpu/s_fmin.c | 28 -- sysdeps/aarch64/fpu/s_fminf.c | 28 -- sysdeps/generic/math-type-macros-double.h | 1 + sysdeps/generic/math-type-macros-float.h | 1 + sysdeps/generic/math-type-macros-float128.h | 1 + sysdeps/generic/math-type-macros-ldouble.h | 1 + sysdeps/generic/math-use-builtins-fmax.h | 4 + sysdeps/generic/math-use-builtins-fmin.h | 4 + sysdeps/generic/math-use-builtins.h | 2 + sysdeps/i386/fpu/e_hypot.S | 75 ----- sysdeps/i386/fpu/e_hypot.c | 58 ++++ sysdeps/i386/fpu/e_hypotf.S | 64 ----- sysdeps/ieee754/dbl-64/e_hypot.c | 270 ++++++++---------- sysdeps/ieee754/dbl-64/w_hypot.c | 1 + sysdeps/ieee754/flt-32/e_hypotf.c | 79 ++--- sysdeps/ieee754/flt-32/math_config.h | 9 + sysdeps/ieee754/flt-32/w_hypotf.c | 1 + sysdeps/ieee754/ldbl-128/e_hypotl.c | 226 +++++++-------- sysdeps/ieee754/ldbl-96/e_hypotl.c | 231 +++++++-------- sysdeps/mach/hurd/i386/libm.abilist | 2 + sysdeps/powerpc/fpu/e_hypot.c | 87 ------ sysdeps/powerpc/fpu/e_hypotf.c | 78 ----- .../powerpc32/power4/fpu/multiarch/Makefile | 5 +- .../power4/fpu/multiarch/e_hypot-power7.c | 23 -- .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 -- .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 --- .../power4/fpu/multiarch/e_hypotf-power7.c | 23 -- .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 -- .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 --- sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 + sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 + sysdeps/unix/sysv/linux/i386/libm.abilist | 2 + .../sysv/linux/m68k/coldfire/libm.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libm.abilist | 2 + .../sysv/linux/microblaze/be/libm.abilist | 2 + .../sysv/linux/microblaze/le/libm.abilist | 2 + .../unix/sysv/linux/mips/mips32/libm.abilist | 2 + .../unix/sysv/linux/mips/mips64/libm.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 + .../linux/powerpc/powerpc32/fpu/libm.abilist | 2 + .../powerpc/powerpc32/nofpu/libm.abilist | 2 + .../linux/powerpc/powerpc64/be/libm.abilist | 2 + .../linux/powerpc/powerpc64/le/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-32/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 + .../sysv/linux/sparc/sparc32/libm.abilist | 2 + .../sysv/linux/sparc/sparc64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libm.abilist | 2 + 65 files changed, 547 insertions(+), 1029 deletions(-) create mode 100644 math/w_hypot.c create mode 100644 math/w_hypotf.c create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c create mode 100644 sysdeps/generic/math-use-builtins-fmax.h create mode 100644 sysdeps/generic/math-use-builtins-fmin.h delete mode 100644 sysdeps/i386/fpu/e_hypot.S create mode 100644 sysdeps/i386/fpu/e_hypot.c delete mode 100644 sysdeps/i386/fpu/e_hypotf.S create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c