[1/3] New generic sincosf
Commit Message
On Fri, Dec 15, 2017 at 8:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 8:24 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> On Fri, Dec 15, 2017 at 7:48 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>>> > On Fri, 15 Dec 2017, H.J. Lu wrote:
>>> >
>>> >> > glibc is build with -fmerge-all-constants, so constants should already be
>>> >> > shared at link time; making constants hidden would be relevant only for
>>> >> > arrays, not for individual floating-point numbers (and if the individual
>>> >> > constants aren't put in appropriate sections with link-time merging, it's
>>> >> > probably because not doing so is more efficient on a particular
>>> >> > architecture).
>>> >>
>>> >> What do you suggest for x86-64 to avoid array duplication?
>>> >
>>> > Well, a followup patch to refactor the arrays into hidden variables would
>>> > be reasonable once the sincosf patch is in - but it would need
>>> > benchmarking to make sure it doesn't adversely affect performance.
>>>
>>> Do you have any suspicions to indicate that hidden array may be
>>> slower than local one to access?
>>
>> On some architectures, GCC uses section anchors for access to variables
>> defined in the same source file. In such cases, moving the arrays to
>> being hidden and defined separately means that the address of each array
>> needs to be loaded separately because the offsets between different
>> variables are no longer known when the source files using them are being
>> compiled, whereas if defined in each source file, a function accessing
>> multiple arrays could load the anchor address and them compute all the
>> other addresses with small offsets from it.
>
> This won't show up on x86.
>
>> You should be able to avoid that problem by putting all the data in a
>> single structure, as done in e_exp2f_data.c, rather than separate hidden
>> variables.
>>
>
> Good to know.
>
> I have no objections now. I will prepare a patch after s_sincosf.c is checked
> in.
>
One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
to sysdeps/ieee754/flt-32/s_sincosf.h?
I am enclosing a patch to add sysdeps/ieee754/flt-32/s_sincosf_data.c.
It saves 352 bytes on x86-64.
H.J.
Comments
On Fri, 15 Dec 2017, H.J. Lu wrote:
> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
> to sysdeps/ieee754/flt-32/s_sincosf.h?
Yes, I think so.
On 12/16/2017 02:20 AM, Joseph Myers wrote:
> On Fri, 15 Dec 2017, H.J. Lu wrote:
>
>> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>> to sysdeps/ieee754/flt-32/s_sincosf.h?
>
> Yes, I think so.
>
Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
name changed as s_sincosf.h. Thanks for the review.
On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>> to sysdeps/ieee754/flt-32/s_sincosf.h?
>>
>>
>> Yes, I think so.
>>
>
> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
> name changed as s_sincosf.h. Thanks for the review.
>
I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
Generic:
"max": 276.971,
"min": 10.813,
"mean": 29.3755
SSE2:
"max": 138.795,
"min": 11.686,
"mean": 22.9463
The SSE2 is 28% faster. Do they use the same algorithm?
On 12/16/2017 08:13 PM, H.J. Lu wrote:
> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>
>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>
>>>> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>> to sysdeps/ieee754/flt-32/s_sincosf.h?
>>>
>>>
>>> Yes, I think so.
>>>
>>
>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>> name changed as s_sincosf.h. Thanks for the review.
>>
>
> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>
> Generic:
>
> "max": 276.971,
> "min": 10.813,
> "mean": 29.3755
>
> SSE2:
>
> "max": 138.795,
> "min": 11.686,
> "mean": 22.9463
>
> The SSE2 is 28% faster. Do they use the same algorithm?
Yes, they are same. One small difference is generic version calls
reduced_sin() and reduced_cos() whereas asm version handles both in the
same branch for reconstruction.
>
>
On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>
>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>> <raji@linux.vnet.ibm.com> wrote:
>>>
>>>
>>>
>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>
>>>>
>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>
>>>>> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>> to sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>
>>>>
>>>>
>>>> Yes, I think so.
>>>>
>>>
>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>> name changed as s_sincosf.h. Thanks for the review.
>>>
>>
>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>
>> Generic:
>>
>> "max": 276.971,
>> "min": 10.813,
>> "mean": 29.3755
>>
>> SSE2:
>>
>> "max": 138.795,
>> "min": 11.686,
>> "mean": 22.9463
>>
>> The SSE2 is 28% faster. Do they use the same algorithm?
>
>
> Yes, they are same. One small difference is generic version calls
> reduced_sin() and reduced_cos() whereas asm version handles both in the same
> branch for reconstruction.
>
Can generic version do the same?
On 12/19/2017 10:05 PM, H.J. Lu wrote:
> On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>>
>>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>>> <raji@linux.vnet.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>>
>>>>>
>>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>>
>>>>>> One comment, should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>>> to sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>>
>>>>>
>>>>>
>>>>> Yes, I think so.
>>>>>
>>>>
>>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>>> name changed as s_sincosf.h. Thanks for the review.
>>>>
>>>
>>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>>
>>> Generic:
>>>
>>> "max": 276.971,
>>> "min": 10.813,
>>> "mean": 29.3755
>>>
>>> SSE2:
>>>
>>> "max": 138.795,
>>> "min": 11.686,
>>> "mean": 22.9463
>>>
>>> The SSE2 is 28% faster. Do they use the same algorithm?
>>
>>
>> Yes, they are same. One small difference is generic version calls
>> reduced_sin() and reduced_cos() whereas asm version handles both in the same
>> branch for reconstruction.
>>
>
> Can generic version do the same?
I don't see any difference in mean value with that change.
>
>
>
From 9ba68929320c3bdb50b3aadc5d024318362fbfba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 15 Dec 2017 12:08:22 -0800
Subject: [PATCH] Add s_sincosf_data.c
Add s_sincosf_data.c to data between sinf, cosf and sincosf.
Tested on x86-64. It reduced the read-only data size by 352 bytes.
* math/Makefile (type-float-routines): Add s_sincosf_data.
* sysdeps/ieee754/flt-32/math_config.h (SINCOSF_PIO2_TABLE):
New.
(SINCOSF_INVPIO4_TABLE): Likewise.
(SINCOSF_DOUBLE_ONES): Likewise.
(__sincosf_data): Likewise.
* sysdeps/ieee754/flt-32/s_sincos.h: Include "math_config.h".
(pio2_table): Replaced with __sincosf_data.
(invpio4_table): Likewise.
(ones): Likewise.
* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file.
---
math/Makefile | 2 +-
sysdeps/ieee754/flt-32/math_config.h | 11 +++++++++
sysdeps/ieee754/flt-32/s_sincos.h | 26 ++++-----------------
sysdeps/ieee754/flt-32/s_sincosf_data.c | 41 +++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+), 22 deletions(-)
create mode 100644 sysdeps/ieee754/flt-32/s_sincosf_data.c
@@ -120,7 +120,7 @@ type-double-routines := branred doasin dosincos halfulp mpa mpatan2 \
# float support
type-float-suffix := f
type-float-routines := k_rem_pio2f math_errf e_exp2f_data e_logf_data \
- e_log2f_data e_powf_log2_data
+ e_log2f_data e_powf_log2_data s_sincosf_data
# _Float128 support
type-float128-suffix := f128
@@ -161,4 +161,15 @@ extern const struct powf_log2_data
double poly[POWF_LOG2_POLY_ORDER];
} __powf_log2_data attribute_hidden;
+/* Shared between sinf, cosf and sincosf. */
+#define SINCOSF_PIO2_TABLE 6
+#define SINCOSF_INVPIO4_TABLE 8
+#define SINCOSF_DOUBLE_ONES 2
+extern const struct sincosf_data
+{
+ double pio2_table[SINCOSF_PIO2_TABLE];
+ double invpio4_table[SINCOSF_INVPIO4_TABLE];
+ double ones[SINCOSF_DOUBLE_ONES];
+} __sincosf_data attribute_hidden;
+
#endif
@@ -16,6 +16,8 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
+#include "math_config.h"
+
/* Chebyshev constants for cos, range -PI/4 - PI/4. */
static const double C0 = -0x1.ffffffffe98aep-2;
static const double C1 = 0x1.55555545c50c7p-5;
@@ -48,27 +50,9 @@ static const double inv_PI_4 = 0x1.45f306dc9c883p+0; /* 4/PI. */
#define FLOAT_EXPONENT_SHIFT 23
#define FLOAT_EXPONENT_BIAS 127
-static const double pio2_table[] = {
- 0 * M_PI_2,
- 1 * M_PI_2,
- 2 * M_PI_2,
- 3 * M_PI_2,
- 4 * M_PI_2,
- 5 * M_PI_2
-};
-
-static const double invpio4_table[] = {
- 0x0p+0,
- 0x1.45f306cp+0,
- 0x1.c9c882ap-28,
- 0x1.4fe13a8p-58,
- 0x1.f47d4dp-85,
- 0x1.bb81b6cp-112,
- 0x1.4acc9ep-142,
- 0x1.0e4107cp-169
-};
-
-static const double ones[] = { 1.0, -1.0 };
+#define pio2_table __sincosf_data.pio2_table
+#define invpio4_table __sincosf_data.invpio4_table
+#define ones __sincosf_data.ones
/* Compute the sine value using Chebyshev polynomials where
THETA is the range reduced absolute value of the input
new file mode 100644
@@ -0,0 +1,41 @@
+/* Shared data between sinf, cosf and sincosf.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "math_config.h"
+
+const struct sincosf_data __sincosf_data = {
+ .pio2_table = {
+ 0 * M_PI_2,
+ 1 * M_PI_2,
+ 2 * M_PI_2,
+ 3 * M_PI_2,
+ 4 * M_PI_2,
+ 5 * M_PI_2
+ },
+ .invpio4_table = {
+ 0x0p+0,
+ 0x1.45f306cp+0,
+ 0x1.c9c882ap-28,
+ 0x1.4fe13a8p-58,
+ 0x1.f47d4dp-85,
+ 0x1.bb81b6cp-112,
+ 0x1.4acc9ep-142,
+ 0x1.0e4107cp-169
+ },
+ .ones = { 1.0, -1.0 }
+};
--
2.14.3