Message ID | HE1PR08MB103555974BE551C979EFD28883300@HE1PR08MB1035.eurprd08.prod.outlook.com |
---|---|
State | New, archived |
Headers | show |
On Wed, 22 Aug 2018, Wilco Dijkstra wrote: > Joseph Myers wrote: > > > + > > +static inline int32_t > > +rem_pio2f (float x, float *y) > > > Please put a comment on this function documenting its semantics. > > Done, see below. This version is OK.
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Joseph Myers wrote: > >> + >> +static inline int32_t >> +rem_pio2f (float x, float *y) > >> Please put a comment on this function documenting its semantics. > > Done, see below. > > > Speedup tanf range reduction by using the new sincosf range > reduction algorithm. Overall code quality is improved due to > inlining, so there is a speedup even if no range reduction is > required. > > Passes GLIBC testsuite on AArch64. Some files are no longer > required which are removed in the next patch. > > tanf througput gains on Cortex-A72: > * |x| < M_PI_4 : 1.1x > * |x| < M_PI_2 : 1.2x > * |x| < 2 * M_PI: 1.5x > * |x| < 120.0 : 1.6x > * |x| < Inf : 12.1x LGTM too. If we were to have a benchtest for tanf with drand48 inputs, should we group the entries according to __kernel_tanf() ? e.g. * |x|>=0.6744 - fast path for __kernel_tanf * |x|<=0.6744
Tulio Magno Quites Machado Filho wrote: > LGTM too. Thanks, I've committed it. > If we were to have a benchtest for tanf with drand48 inputs, should we group > the entries according to __kernel_tanf() ? e.g. > > * |x|>=0.6744 - fast path for __kernel_tanf > * |x|<=0.6744 Having a real trace would be best, and randomized inputs for commonly used input ranges are useful. I'm not sure whether a better algorithm would have the same slow/fast paths as the current code (sinf/cosf are now 4x faster...), but if you're planning to post a benchtest then testing those ranges would be fine. Cheers, Wilco
diff --git a/sysdeps/ieee754/flt-32/s_tanf.c b/sysdeps/ieee754/flt-32/s_tanf.c index ba3af54913669e4abdfd864307856ec44138f9b9..fd104103ad026a8c87ea7b571f13e868561a2998 100644 --- a/sysdeps/ieee754/flt-32/s_tanf.c +++ b/sysdeps/ieee754/flt-32/s_tanf.c @@ -21,6 +21,33 @@ static char rcsid[] = "$NetBSD: s_tanf.c,v 1.4 1995/05/10 20:48:20 jtc Exp $"; #include <math.h> #include <math_private.h> #include <libm-alias-float.h> +#include "s_sincosf.h" + +/* Reduce range of X to a multiple of PI/2. The modulo result is between + -PI/4 and PI/4 and returned as a high part y[0] and a low part y[1]. + The low bit in the return value indicates the first or 2nd half of tanf. */ +static inline int32_t +rem_pio2f (float x, float *y) +{ + double dx = x; + int n; + const sincos_t *p = &__sincosf_table[0]; + + if (__glibc_likely (abstop12 (x) < abstop12 (120.0f))) + dx = reduce_fast (dx, p, &n); + else + { + uint32_t xi = asuint (x); + int sign = xi >> 31; + + dx = reduce_large (xi, &n); + dx = sign ? -dx : dx; + } + + y[0] = dx; + y[1] = dx - y[0]; + return n; +} float __tanf(float x) { @@ -42,7 +69,7 @@ float __tanf(float x) /* argument reduction needed */ else { - n = __ieee754_rem_pio2f(x,y); + n = rem_pio2f(x,y); return __kernel_tanf(y[0],y[1],1-((n&1)<<1)); /* 1 -- n even -1 -- n odd */ }