improves exp() and expf() performance on Sparc.

  These changes will be active for all Sparc platforms.

Two versions will be available, one for platforms that support
direct fp to int register transfer (niagara4 and later, tested
by HWCAP_SPARC_CRYPTO) and a default version for platforms that do not
have that support. Both versions share src code, with the difference
being in compile time flags.

Typical performance gains for new exp() for Sparc niagara4 and later:

exp() - 8x to 14x (depending on input value)
expf() - 16x

Using the glibc perf tests,
      old exp    new exp
max   17629        178
min     399         29
mean   5317         71

The extreme max values for the old (ieee754) exp are due to the
extreme care the old algorithm takes when the true value is very near
0.5 ulp away from an value representable in double precision. The new
algorithm does not take special measures for those cases and can give
a different result that may be off from the true value by 0.51
ulp. The frequency of such differences appears to be less than one in
200 input values based on input tests of a range of 4000+ values.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp
This is a difference of 0.02 ulp which is well within the 1 ulp requirement
although the test function reports the difference as a failure.

For expf(), no differences in the test suite were seen in
FE_TONEAREST, FE_DOWNWARD, FE_TOWARDZERO. Two differences were
seen with FE_UPWARD. For both cases, the precise value was less
than 0.00005 less than the value reported by the new code.
The difference of the old and new was 1 ulp.
Typical example:
Failure: Test: exp_upward (-0x1p-20)
Result:
 new code:  9.999990463256e-01 0x1.ffffe0p-1
 old code:  9.999991059303e-01 0x1.ffffe2p-1
 precise    9.999990463261e-01 (high precision) (using bc -l at scale=60)
---
 sysdeps/sparc/fpu/libm_endian.h                    |   32 ++
 .../sparc/sparc32/sparcv9/fpu/multiarch/Makefile   |    5 +
 .../sparc32/sparcv9/fpu/multiarch/e_exp-generic.c  |    1 +
 .../sparc32/sparcv9/fpu/multiarch/e_exp-niagara4.c |    1 +
 .../sparc32/sparcv9/fpu/multiarch/e_expf-generic.c |    1 +
 .../sparcv9/fpu/multiarch/e_expf-niagara4.c        |    1 +
 sysdeps/sparc/sparc64/fpu/multiarch/Makefile       |    5 +
 .../sparc/sparc64/fpu/multiarch/e_exp-generic.c    |   40 ++
 .../sparc/sparc64/fpu/multiarch/e_exp-niagara4.c   |   10 +
 sysdeps/sparc/sparc64/fpu/multiarch/e_exp.h        |  383 ++++++++++++++++++
 .../sparc/sparc64/fpu/multiarch/e_expf-generic.c   |   41 ++
 .../sparc/sparc64/fpu/multiarch/e_expf-niagara4.c  |   10 +
 sysdeps/sparc/sparc64/fpu/multiarch/e_expf.h       |  405 ++++++++++++++++++++
 13 files changed, 935 insertions(+), 0 deletions(-)
 create mode 100644 sysdeps/sparc/fpu/libm_endian.h
 create mode 100644 sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/e_exp-generic.c
 create mode 100644 sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/e_exp-niagara4.c
 create mode 100644 sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/e_expf-generic.c
 create mode 100644 sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/e_expf-niagara4.c
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_exp-generic.c
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_exp-niagara4.c
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_exp.h
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_expf-generic.c
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_expf-niagara4.c
 create mode 100644 sysdeps/sparc/sparc64/fpu/multiarch/e_expf.h

improves exp() and expf() performance on Sparc.

Commit Message

Comments

Patch