s_sinf.c: Replace floor with FLOOR_DOUBLE_TO_INT/FLOOR_INT_TO_DOUBLE_HALF
Commit Message
Since s_sinf.c either assigns the return value of floor to integer
or passes double converted from integer to floor, this patch adds
FLOOR_DOUBLE_TO_INT and FLOOR_INT_TO_DOUBLE_HALF to replace floor.
They are default to floor. A target can define FLOOR_DOUBLE_TO_INT
and FLOOR_INT_TO_DOUBLE_HALF as simple casts to avoid calling floor.
Also since long == int for 32-bit targets, we can use long instead of
int to avoid 64-bit integer for 64-bit targets.
On Skylake, bench-sinf reports performance improvement:
Before After Improvement
max 130.566 129.564 0.8%
min 7.704 7.706 0%
mean 21.8188 19.1363 14%
Any comments?
H.J.
---
* sysdeps/generic/math_private.h (FLOOR_DOUBLE_TO_INT): New.
(FLOOR_INT_TO_DOUBLE_HALF): Likewise.
* sysdeps/ieee754/flt-32/s_sinf.c (reduced): Replace long with
int.
(SINF_FUNC): Likewise. Replace floor with FLOOR_DOUBLE_TO_INT
and FLOOR_INT_TO_DOUBLE_HALF.
* sysdeps/x86_64/fpu/math_private.h (__floor_double_to_int):
New.
(FLOOR_DOUBLE_TO_INT): Likewise.
(__floor_int_to_double_half): Likewise.
(FLOOR_INT_TO_DOUBLE_HALF): Likewise.
---
sysdeps/generic/math_private.h | 8 ++++++++
sysdeps/ieee754/flt-32/s_sinf.c | 13 +++++++------
sysdeps/x86_64/fpu/math_private.h | 17 +++++++++++++++++
3 files changed, 32 insertions(+), 6 deletions(-)
Comments
On Tue, 5 Dec 2017, H.J. Lu wrote:
> diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
> index f29898c19c..a2cdce5b6a 100644
> --- a/sysdeps/generic/math_private.h
> +++ b/sysdeps/generic/math_private.h
> @@ -184,6 +184,14 @@ do { \
> } while (0)
> #endif
>
> +#ifndef FLOOR_DOUBLE_TO_INT
> +# define FLOOR_DOUBLE_TO_INT(x) ((int) __floor (x))
> +#endif
> +
> +#ifndef FLOOR_INT_TO_DOUBLE_HALF
> +# define FLOOR_INT_TO_DOUBLE_HALF(x) __floor ((x) / 2.0)
> +#endif
These need comments defining their semantics (argument types and ranges,
return values and types).
The existing code is using unsigned int. You're using int here for the
return type of FLOOR_DOUBLE_TO_INT but still converting to unsigned int in
the caller. (The semantics in the caller do in fact include that the
argument is positive but in the range of int.)
I'd be surprised if the FLOOR_INT_TO_DOUBLE_HALF definition dividing by
2.0 and calling floor, as opposed to dividing by integer 2 and converting
the result to double (which is just a right shift, given the argument is
unsigned) is optimal anywhere, unless the compiler optimizes it into an
integer shift anyway. Thus, it would make sense just to use "n / 2"
unconditionally in place of "__floor (n / 2.0)".
Similarly, I'd expect a direct conversion to int (without a call to
__floor) to be at least as good as the present code for
FLOOR_DOUBLE_TO_INT everywhere, given the range constraints that make it
OK to do so.
So I think the natural change would not need to add any new macros, but
would change s_sinf.c unconditionally to do essentially what your patch
does in the x86_64 case. As the original patch of Rajalakshmi's was
benchmarked on s390, you could ask her to confirm such a change makes
performance no worse there (and possibly improves it).
On Tue, Dec 5, 2017 at 5:32 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 5 Dec 2017, H.J. Lu wrote:
>
>> diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
>> index f29898c19c..a2cdce5b6a 100644
>> --- a/sysdeps/generic/math_private.h
>> +++ b/sysdeps/generic/math_private.h
>> @@ -184,6 +184,14 @@ do { \
>> } while (0)
>> #endif
>>
>> +#ifndef FLOOR_DOUBLE_TO_INT
>> +# define FLOOR_DOUBLE_TO_INT(x) ((int) __floor (x))
>> +#endif
>> +
>> +#ifndef FLOOR_INT_TO_DOUBLE_HALF
>> +# define FLOOR_INT_TO_DOUBLE_HALF(x) __floor ((x) / 2.0)
>> +#endif
>
> These need comments defining their semantics (argument types and ranges,
> return values and types).
>
> The existing code is using unsigned int. You're using int here for the
> return type of FLOOR_DOUBLE_TO_INT but still converting to unsigned int in
> the caller. (The semantics in the caller do in fact include that the
> argument is positive but in the range of int.)
>
> I'd be surprised if the FLOOR_INT_TO_DOUBLE_HALF definition dividing by
> 2.0 and calling floor, as opposed to dividing by integer 2 and converting
> the result to double (which is just a right shift, given the argument is
> unsigned) is optimal anywhere, unless the compiler optimizes it into an
> integer shift anyway. Thus, it would make sense just to use "n / 2"
> unconditionally in place of "__floor (n / 2.0)".
>
> Similarly, I'd expect a direct conversion to int (without a call to
> __floor) to be at least as good as the present code for
> FLOOR_DOUBLE_TO_INT everywhere, given the range constraints that make it
> OK to do so.
>
> So I think the natural change would not need to add any new macros, but
> would change s_sinf.c unconditionally to do essentially what your patch
> does in the x86_64 case. As the original patch of Rajalakshmi's was
> benchmarked on s390, you could ask her to confirm such a change makes
> performance no worse there (and possibly improves it).
Here is the patch to replace floor with simple casts. It improves performance
of max and mean by 30% on x86-64. Rajalakshmi, can you try it on s390?
Thanks.
On 12/05/2017 07:29 PM, H.J. Lu wrote:
> On Tue, Dec 5, 2017 at 5:32 AM, Joseph Myers<joseph@codesourcery.com> wrote:
>> On Tue, 5 Dec 2017, H.J. Lu wrote:
>>
>>> diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h
>>> index f29898c19c..a2cdce5b6a 100644
>>> --- a/sysdeps/generic/math_private.h
>>> +++ b/sysdeps/generic/math_private.h
>>> @@ -184,6 +184,14 @@ do { \
>>> } while (0)
>>> #endif
>>>
>>> +#ifndef FLOOR_DOUBLE_TO_INT
>>> +# define FLOOR_DOUBLE_TO_INT(x) ((int) __floor (x))
>>> +#endif
>>> +
>>> +#ifndef FLOOR_INT_TO_DOUBLE_HALF
>>> +# define FLOOR_INT_TO_DOUBLE_HALF(x) __floor ((x) / 2.0)
>>> +#endif
>> These need comments defining their semantics (argument types and ranges,
>> return values and types).
>>
>> The existing code is using unsigned int. You're using int here for the
>> return type of FLOOR_DOUBLE_TO_INT but still converting to unsigned int in
>> the caller. (The semantics in the caller do in fact include that the
>> argument is positive but in the range of int.)
>>
>> I'd be surprised if the FLOOR_INT_TO_DOUBLE_HALF definition dividing by
>> 2.0 and calling floor, as opposed to dividing by integer 2 and converting
>> the result to double (which is just a right shift, given the argument is
>> unsigned) is optimal anywhere, unless the compiler optimizes it into an
>> integer shift anyway. Thus, it would make sense just to use "n / 2"
>> unconditionally in place of "__floor (n / 2.0)".
>>
>> Similarly, I'd expect a direct conversion to int (without a call to
>> __floor) to be at least as good as the present code for
>> FLOOR_DOUBLE_TO_INT everywhere, given the range constraints that make it
>> OK to do so.
>>
>> So I think the natural change would not need to add any new macros, but
>> would change s_sinf.c unconditionally to do essentially what your patch
>> does in the x86_64 case. As the original patch of Rajalakshmi's was
>> benchmarked on s390, you could ask her to confirm such a change makes
>> performance no worse there (and possibly improves it).
> Here is the patch to replace floor with simple casts. It improves performance
> of max and mean by 30% on x86-64. Rajalakshmi, can you try it on s390?
Mean value reduced from 22.0 to 20.0 with this patch on s390x.
On Tue, Dec 5, 2017 at 6:29 AM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/05/2017 07:29 PM, H.J. Lu wrote:
>>
>> On Tue, Dec 5, 2017 at 5:32 AM, Joseph Myers<joseph@codesourcery.com>
>> wrote:
>>>
>>> On Tue, 5 Dec 2017, H.J. Lu wrote:
>>>
>>>> diff --git a/sysdeps/generic/math_private.h
>>>> b/sysdeps/generic/math_private.h
>>>> index f29898c19c..a2cdce5b6a 100644
>>>> --- a/sysdeps/generic/math_private.h
>>>> +++ b/sysdeps/generic/math_private.h
>>>> @@ -184,6 +184,14 @@ do {
>>>> \
>>>> } while (0)
>>>> #endif
>>>>
>>>> +#ifndef FLOOR_DOUBLE_TO_INT
>>>> +# define FLOOR_DOUBLE_TO_INT(x) ((int) __floor (x))
>>>> +#endif
>>>> +
>>>> +#ifndef FLOOR_INT_TO_DOUBLE_HALF
>>>> +# define FLOOR_INT_TO_DOUBLE_HALF(x) __floor ((x) / 2.0)
>>>> +#endif
>>>
>>> These need comments defining their semantics (argument types and ranges,
>>> return values and types).
>>>
>>> The existing code is using unsigned int. You're using int here for the
>>> return type of FLOOR_DOUBLE_TO_INT but still converting to unsigned int
>>> in
>>> the caller. (The semantics in the caller do in fact include that the
>>> argument is positive but in the range of int.)
>>>
>>> I'd be surprised if the FLOOR_INT_TO_DOUBLE_HALF definition dividing by
>>> 2.0 and calling floor, as opposed to dividing by integer 2 and converting
>>> the result to double (which is just a right shift, given the argument is
>>> unsigned) is optimal anywhere, unless the compiler optimizes it into an
>>> integer shift anyway. Thus, it would make sense just to use "n / 2"
>>> unconditionally in place of "__floor (n / 2.0)".
>>>
>>> Similarly, I'd expect a direct conversion to int (without a call to
>>> __floor) to be at least as good as the present code for
>>> FLOOR_DOUBLE_TO_INT everywhere, given the range constraints that make it
>>> OK to do so.
>>>
>>> So I think the natural change would not need to add any new macros, but
>>> would change s_sinf.c unconditionally to do essentially what your patch
>>> does in the x86_64 case. As the original patch of Rajalakshmi's was
>>> benchmarked on s390, you could ask her to confirm such a change makes
>>> performance no worse there (and possibly improves it).
>>
>> Here is the patch to replace floor with simple casts. It improves
>> performance
>> of max and mean by 30% on x86-64. Rajalakshmi, can you try it on s390?
>
>
> Mean value reduced from 22.0 to 20.0 with this patch on s390x.
>
Any objections/comenrts for my patch:
https://sourceware.org/ml/libc-alpha/2017-12/msg00121.html
to master?
Thanks.
On Tue, 5 Dec 2017, Rajalakshmi Srinivasaraghavan wrote:
> > Here is the patch to replace floor with simple casts. It improves
> > performance
> > of max and mean by 30% on x86-64. Rajalakshmi, can you try it on s390?
>
> Mean value reduced from 22.0 to 20.0 with this patch on s390x.
Thanks. The patch is OK.
@@ -184,6 +184,14 @@ do { \
} while (0)
#endif
+#ifndef FLOOR_DOUBLE_TO_INT
+# define FLOOR_DOUBLE_TO_INT(x) ((int) __floor (x))
+#endif
+
+#ifndef FLOOR_INT_TO_DOUBLE_HALF
+# define FLOOR_INT_TO_DOUBLE_HALF(x) __floor ((x) / 2.0)
+#endif
+
/* We need to guarantee an expansion of name when building
ldbl-128 files as another type (e.g _Float128). */
#define mathx_hidden_def(name) hidden_def(name)
@@ -85,8 +85,8 @@ static const int ones[] = { +1, -1 };
SIGNBIT is used to add the correct sign after the Chebyshev
polynomial is computed. */
static inline float
-reduced (const double theta, const unsigned long int n,
- const unsigned long int signbit)
+reduced (const double theta, const unsigned int n,
+ const unsigned int signbit)
{
double sx;
const double theta2 = theta * theta;
@@ -162,14 +162,14 @@ SINF_FUNC (float x)
}
else /* |x| >= Pi/4. */
{
- unsigned long int signbit = (x < 0);
+ unsigned int signbit = (x < 0);
if (abstheta < 9 * M_PI_4) /* |x| < 9*Pi/4. */
{
/* There are cases where FE_UPWARD rounding mode can
produce a result of abstheta * inv_PI_4 == 9,
where abstheta < 9pi/4, so the domain for
pio2_table must go to 5 (9 / 2 + 1). */
- unsigned long int n = (abstheta * inv_PI_4) + 1;
+ unsigned int n = (abstheta * inv_PI_4) + 1;
theta = abstheta - pio2_table[n / 2];
return reduced (theta, n, signbit);
}
@@ -177,8 +177,9 @@ SINF_FUNC (float x)
{
if (abstheta < 0x1p+23) /* |x| < 2^23. */
{
- unsigned long int n = __floor (abstheta * inv_PI_4) + 1.0;
- double x = __floor (n / 2.0);
+ unsigned int n
+ = FLOOR_DOUBLE_TO_INT (abstheta * inv_PI_4) + 1.0;
+ double x = FLOOR_INT_TO_DOUBLE_HALF (n);
theta = x * PI_2_lo + (x * PI_2_hi + abstheta);
/* Argument reduction needed. */
return reduced (theta, n, signbit);
@@ -45,6 +45,23 @@
f = f__; \
} while (0)
+extern inline int
+__floor_double_to_int (double x)
+{
+ return x;
+}
+
+#define FLOOR_DOUBLE_TO_INT(x) __floor_double_to_int (x)
+
+extern inline double
+__floor_int_to_double_half (int x)
+{
+ x /= 2;
+ return x;
+}
+
+#define FLOOR_INT_TO_DOUBLE_HALF(x) __floor_int_to_double_half (x)
+
#include <sysdeps/i386/fpu/fenv_private.h>
#include_next <math_private.h>