From patchwork Tue Dec  7 19:03:41 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Adhemerval Zanella <adhemerval.zanella@linaro.org>
X-Patchwork-Id: 48595
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DD4BF3858439
	for <patchwork@sourceware.org>; Tue,  7 Dec 2021 19:04:20 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD4BF3858439
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1638903860;
	bh=BczRdudcKBp7fdqU3oELz+nqMtE6o6AjrzrpCa+JJKQ=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=Q4Cd+ma9rQrMsgsa1+SXq7QIkef4t9h4VNQ9sIY/yEubVDbhraQsHA4Jtkf3gzL10
	 9Mics+HePK0XTb4nS11YnU0QjANG4UgAsesfMHfxdD2Bhz42bYbxSGtXphv0MUpNvl
	 Y7NQSaT9OQsP94unwJW4tVNhh5SRUmg40pjEBeqI=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-ua1-x929.google.com (mail-ua1-x929.google.com
 [IPv6:2607:f8b0:4864:20::929])
 by sourceware.org (Postfix) with ESMTPS id 9CFBC3858D28
 for <libc-alpha@sourceware.org>; Tue,  7 Dec 2021 19:03:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CFBC3858D28
Received: by mail-ua1-x929.google.com with SMTP id o1so337805uap.4
 for <libc-alpha@sourceware.org>; Tue, 07 Dec 2021 11:03:58 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=BczRdudcKBp7fdqU3oELz+nqMtE6o6AjrzrpCa+JJKQ=;
 b=3Qq1hleQUUTlp0LfMZZYTYp6O0feJO7IXZSj+1arjtTMmbYhVzQFvUNgEYepg7tdjS
 9YTRVNbeXboW9zKFAl+o0QrBI2+doKKAbfc/yZHhGbMA0EyyZosIYCBvgdqsYXMu+QDz
 gNrXytAAOb1EqICpj1abPNFFPLpuSBTUvSJanW9WQuL/3tDd9hKGgu5MfwzIlv7kdJak
 jnoHHnPJ7X/K4ZvVzYay5VYCardoxP3YCtAcFAl2uGx68ORyokB3xXxJVRf0Yg1qyGhG
 JS6dNFZIm3p0wuSeNTWlXSbSBbAHNzH7yOm1ch4q/sZVSNSAFNQWl/3p8NlzKT5AWT5r
 Am9w==
X-Gm-Message-State: AOAM530HOUTAUq/RiW62iWCZjL1sQT9EsZALaYedREh5BCvPcaloIjub
 Ka0yaJpF3Dd+zaeVDNgFf1yHFdgeXUBX+w==
X-Google-Smtp-Source: 
 ABdhPJyA2C6PSthC3lqhAuNx5VHjvMtetp+Ajq8/3GE3yaM/3mpOp877HuYSzVcODlG/1EVH/hK4+g==
X-Received: by 2002:a67:d583:: with SMTP id m3mr46291587vsj.16.1638903837834;
 Tue, 07 Dec 2021 11:03:57 -0800 (PST)
Received: from birita.. ([2804:431:c7ca:a776:246c:70fd:1377:eec7])
 by smtp.gmail.com with ESMTPSA id l190sm312904vsc.26.2021.12.07.11.03.56
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 07 Dec 2021 11:03:57 -0800 (PST)
To: libc-alpha@sourceware.org, Paul Zimmermann <Paul.Zimmermann@inria.fr>,
 Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Subject: [PATCH v5 00/12] Improve hypot
Date: Tue,  7 Dec 2021 16:03:41 -0300
Message-Id: <20211207190353.3282666-1-adhemerval.zanella@linaro.org>
X-Mailer: git-send-email 2.32.0
MIME-Version: 1.0
X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha
 <libc-alpha@sourceware.org>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

This patchset add a different algorithm along with Wilco [1] performance
improvements.  The default implementation is based on the 'An Improved 
Algorithm for hypot(a,b)' by Carlos F. Borges [2] with some fixes and
improvements (although the dbl-64 one uses a slight different approach
when fast FMA is avaliable).  This method is also used by Julia language
runtime [2].

The motivation for this change are:

  1. Use a newer algotihm that favor FP operations over interger ones
     (which tends to show better performance specially on newer
     processor with multiple FP units).  It also allows consolidate
     the multiple implementation (for instance, the powerpc one).

  2. The new algorithm is more precise without minimum performance
     difference.

The current hypot() implementation seems already to be bounded to a
maximum of 1 ulp of error, however the new proposed algorithm shows an
slight precision improvement by showing more correctly rounded results.

With a random 1e9 inputs for different float format I see:

  - An improvement from 3427362 to 18457 results with 1 ulp of
    error for Binary64.
  - An improvement from 233442 to 1274 results with 1 ulp of
    error for Binary96 (x86_64).
  - An improvement from 453045 to 1294 results with 1 ulp of
    error for Binary96 (x86_64).

Also for the maximal known error master shows (in ulps, with
corresponding inputs), determined with [3]:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77
  binary64  0.987 -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022
  binary96  0.981
0x1.73f339f61eda21dp-16384l,0x2.e45f9f9500877e2p-16384l
  binary128 0.985
-0x2.d8311789103b76133ea1d5bc38c4p-16384,-0x1.6d85492006d7dcc6cc52938684p-16384

With the new implementation:

  binary32  0.500 0x1.3ac98p+67,-0x1.ba5ec2p+77 [same]
  binary64  0.792 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022
  binary96  0.584
-0x2.97b86706043d619p+7240l,0x1.8256bdd12d2e163ep+7240l
  binary128 0.749
0x2.2d5faf4036d6e68566f01054612p-8192,0x3.5738e8e2505f5d1fc2973716f05p-8192

If FMA is uses the binary64 shows a slight worse precision:

I have adapted the dbl-64, ldbl-96, and ldbl-128, the flt-32 is not
required since it calls the dbl-64 one.  I have not adapated ldbl-128ibm
since the format has a lot of caveats and IBM aims to move to ldbl-128.

[1] https://sourceware.org/pipermail/libc-alpha/2021-November/133523.html
[1] https://arxiv.org/pdf/1904.09481.pdf
[2] https://github.com/JuliaLang/julia/commit/4a046009a3362ab5e17d369641dbbc9657eb680c
[3] https://gitlab.inria.fr/zimmerma/math_accuracy/-/blob/master/binary64/check_sample2.c
---
Changes from v5:
 * Add missing cast on generic hypotf.
 * Add missing math_narrow_eval on generic hypot.
 * Fixed return value on generic ldbl-96 hypotl.
 * Fixed return value on generic ldbl-128 hypotl.
 * Add POWER10 performance values.
 * Add missing cast on i686 hypotf.
 * Rewrite math-use-builtinds-fmax.h and math-use-builtinds-fmin.h
   based on Joseph's suggestions.
---

Adhemerval Zanella (9):
  math: Simplify hypotf implementation
  math: Use an improved algorithm for hypotl (ldbl-96)
  math: Use an improved algorithm for hypotl (ldbl-128)
  i386: Move hypot implementation to C
  math: Remove powerpc e_hypot
  math: Add math-use-builtinds-fmax.h
  math: Add math-use-builtinds-fmin.h
  aarch64: Add math-use-builtins-f{max,min}.h
  math: Remove the error handling wrapper from hypot and hypotf

Wilco Dijkstra (3):
  math: Use an improved algorithm for hypot (dbl-64)
  math: Improve hypot performance with FMA
  math: Use fmin/fmax on hypot

 math/Versions                                 |   2 +
 math/s_fmax_template.c                        |   5 +
 math/s_fmin_template.c                        |   6 +-
 math/w_hypot.c                                |   8 +
 math/w_hypot_compat.c                         |  13 +-
 math/w_hypotf.c                               |   8 +
 math/w_hypotf_compat.c                        |   6 +-
 sysdeps/aarch64/fpu/math-use-builtins-fmax.h  |   4 +
 sysdeps/aarch64/fpu/math-use-builtins-fmin.h  |   4 +
 sysdeps/aarch64/fpu/s_fmax.c                  |  28 --
 sysdeps/aarch64/fpu/s_fmaxf.c                 |  28 --
 sysdeps/aarch64/fpu/s_fmin.c                  |  28 --
 sysdeps/aarch64/fpu/s_fminf.c                 |  28 --
 sysdeps/generic/math-type-macros-double.h     |   1 +
 sysdeps/generic/math-type-macros-float.h      |   1 +
 sysdeps/generic/math-type-macros-float128.h   |   1 +
 sysdeps/generic/math-type-macros-ldouble.h    |   1 +
 sysdeps/generic/math-use-builtins-fmax.h      |   4 +
 sysdeps/generic/math-use-builtins-fmin.h      |   4 +
 sysdeps/generic/math-use-builtins.h           |   2 +
 sysdeps/i386/fpu/e_hypot.S                    |  75 -----
 sysdeps/i386/fpu/e_hypot.c                    |  58 ++++
 sysdeps/i386/fpu/e_hypotf.S                   |  64 -----
 sysdeps/ieee754/dbl-64/e_hypot.c              | 270 ++++++++----------
 sysdeps/ieee754/dbl-64/w_hypot.c              |   1 +
 sysdeps/ieee754/flt-32/e_hypotf.c             |  79 ++---
 sysdeps/ieee754/flt-32/math_config.h          |   9 +
 sysdeps/ieee754/flt-32/w_hypotf.c             |   1 +
 sysdeps/ieee754/ldbl-128/e_hypotl.c           | 226 +++++++--------
 sysdeps/ieee754/ldbl-96/e_hypotl.c            | 231 +++++++--------
 sysdeps/mach/hurd/i386/libm.abilist           |   2 +
 sysdeps/powerpc/fpu/e_hypot.c                 |  87 ------
 sysdeps/powerpc/fpu/e_hypotf.c                |  78 -----
 .../powerpc32/power4/fpu/multiarch/Makefile   |   5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     |  23 --
 .../power4/fpu/multiarch/e_hypot-ppc32.c      |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  |  33 ---
 .../power4/fpu/multiarch/e_hypotf-power7.c    |  23 --
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     |  23 --
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c |  33 ---
 sysdeps/unix/sysv/linux/aarch64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/arm/be/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libm.abilist   |   2 +
 sysdeps/unix/sysv/linux/hppa/libm.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libm.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libm.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libm.abilist  |   2 +
 .../sysv/linux/microblaze/be/libm.abilist     |   2 +
 .../sysv/linux/microblaze/le/libm.abilist     |   2 +
 .../unix/sysv/linux/mips/mips32/libm.abilist  |   2 +
 .../unix/sysv/linux/mips/mips64/libm.abilist  |   2 +
 sysdeps/unix/sysv/linux/nios2/libm.abilist    |   2 +
 .../linux/powerpc/powerpc32/fpu/libm.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libm.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libm.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libm.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libm.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libm.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libm.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libm.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libm.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libm.abilist     |   2 +
 .../unix/sysv/linux/x86_64/64/libm.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libm.abilist   |   2 +
 65 files changed, 547 insertions(+), 1029 deletions(-)
 create mode 100644 math/w_hypot.c
 create mode 100644 math/w_hypotf.c
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmax.h
 create mode 100644 sysdeps/aarch64/fpu/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/aarch64/fpu/s_fmax.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmaxf.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fmin.c
 delete mode 100644 sysdeps/aarch64/fpu/s_fminf.c
 create mode 100644 sysdeps/generic/math-use-builtins-fmax.h
 create mode 100644 sysdeps/generic/math-use-builtins-fmin.h
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
 create mode 100644 sysdeps/ieee754/dbl-64/w_hypot.c
 create mode 100644 sysdeps/ieee754/flt-32/w_hypotf.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c