[v3] improve cexp performance for imaginary inputs

  cexp(x) can avoid the exponential from the exp(x_r) * (cos(x_i) + i sin(x_i))
when the real part of the input is zero.
This is a common enough input to be worth optimizing, e.g. twiddle factors in
fast fourier transforms.

Even though the exp function has a fast path for the zero input case it
does still impose significant overhead in the computation.
For x = 0.5 + 0.5I the overhead is about 50%, for the range over
[-pi,pi,10000] its still 15%.

double cexp benchmark on an amd phenom II X4, with the change:
  "cexp": {
   "realandimag": {
    "duration": 3.20798e+10,
    "iterations": 1.47882e+08,
    "max": 665.21,
    "min": 215.071,
    "mean": 216.929
   },
   "imagonly": {
    "duration": 3.20139e+10,
    "iterations": 3.01673e+08,
    "max": 486.618,
    "min": 105.573,
    "mean": 106.121
   }

without the change:

  "cexp": {
   "realandimag": {
    "duration": 3.20785e+10,
    "iterations": 1.51123e+08,
    "max": 373.893,
    "min": 212.079,
    "mean": 212.268
   },
   "imagonly": {
    "duration": 3.2056e+10,
    "iterations": 2.03904e+08,
    "max": 715.579,
    "min": 155.042,
    "mean": 157.211
   }
---
 benchtests/Makefile     | 5 +++--
 benchtests/cexp-inputs  | 7 +++++++
 benchtests/cexpf-inputs | 7 +++++++
 benchtests/cexpl-inputs | 7 +++++++
 math/s_cexp.c           | 4 +++-
 math/s_cexpf.c          | 4 +++-
 math/s_cexpl.c          | 4 +++-
 7 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 benchtests/cexp-inputs
 create mode 100644 benchtests/cexpf-inputs
 create mode 100644 benchtests/cexpl-inputs

[v3] improve cexp performance for imaginary inputs

Commit Message

Comments

Patch