[1/3] New generic sincosf

Message ID CAMe9rOq6PZ17V-sMJm50Lxw8Ko+4b37fb8n6JDP_F+m8iSG-Vg@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu Dec. 15, 2017, 8:34 p.m. UTC
  On Fri, Dec 15, 2017 at 8:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 8:24 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> On Fri, Dec 15, 2017 at 7:48 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>>> > On Fri, 15 Dec 2017, H.J. Lu wrote:
>>> >
>>> >> > glibc is build with -fmerge-all-constants, so constants should already be
>>> >> > shared at link time; making constants hidden would be relevant only for
>>> >> > arrays, not for individual floating-point numbers (and if the individual
>>> >> > constants aren't put in appropriate sections with link-time merging, it's
>>> >> > probably because not doing so is more efficient on a particular
>>> >> > architecture).
>>> >>
>>> >> What do you suggest for x86-64 to avoid array duplication?
>>> >
>>> > Well, a followup patch to refactor the arrays into hidden variables would
>>> > be reasonable once the sincosf patch is in - but it would need
>>> > benchmarking to make sure it doesn't adversely affect performance.
>>>
>>> Do you have any suspicions to indicate that hidden array may be
>>> slower than local one to access?
>>
>> On some architectures, GCC uses section anchors for access to variables
>> defined in the same source file.  In such cases, moving the arrays to
>> being hidden and defined separately means that the address of each array
>> needs to be loaded separately because the offsets between different
>> variables are no longer known when the source files using them are being
>> compiled, whereas if defined in each source file, a function accessing
>> multiple arrays could load the anchor address and them compute all the
>> other addresses with small offsets from it.
>
> This won't show up on x86.
>
>> You should be able to avoid that problem by putting all the data in a
>> single structure, as done in e_exp2f_data.c, rather than separate hidden
>> variables.
>>
>
> Good to know.
>
> I have no objections now.  I will prepare a patch after s_sincosf.c is checked
> in.
>

One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
to  sysdeps/ieee754/flt-32/s_sincosf.h?

I am enclosing a patch to add sysdeps/ieee754/flt-32/s_sincosf_data.c.
It saves 352 bytes on x86-64.

H.J.
  

Comments

Joseph Myers Dec. 15, 2017, 8:50 p.m. UTC | #1
On Fri, 15 Dec 2017, H.J. Lu wrote:

> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
> to  sysdeps/ieee754/flt-32/s_sincosf.h?

Yes, I think so.
  
Rajalakshmi S Dec. 16, 2017, 8:52 a.m. UTC | #2
On 12/16/2017 02:20 AM, Joseph Myers wrote:
> On Fri, 15 Dec 2017, H.J. Lu wrote:
> 
>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
> 
> Yes, I think so.
> 

Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
name changed as s_sincosf.h.  Thanks for the review.
  
H.J. Lu Dec. 16, 2017, 2:43 p.m. UTC | #3
On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>
>>
>> Yes, I think so.
>>
>
> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
> name changed as s_sincosf.h.  Thanks for the review.
>

I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:

Generic:

    "max": 276.971,
    "min": 10.813,
    "mean": 29.3755

SSE2:

    "max": 138.795,
    "min": 11.686,
    "mean": 22.9463

The SSE2 is 28% faster.  Do they use the same algorithm?
  
Rajalakshmi S Dec. 18, 2017, 3:26 a.m. UTC | #4
On 12/16/2017 08:13 PM, H.J. Lu wrote:
> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>
>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>
>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>
>>>
>>> Yes, I think so.
>>>
>>
>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>> name changed as s_sincosf.h.  Thanks for the review.
>>
> 
> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
> 
> Generic:
> 
>      "max": 276.971,
>      "min": 10.813,
>      "mean": 29.3755
> 
> SSE2:
> 
>      "max": 138.795,
>      "min": 11.686,
>      "mean": 22.9463
> 
> The SSE2 is 28% faster.  Do they use the same algorithm?

Yes, they are same. One small difference is generic version calls 
reduced_sin() and reduced_cos() whereas asm version handles both in the 
same branch for reconstruction.

> 
>
  
H.J. Lu Dec. 19, 2017, 4:35 p.m. UTC | #5
On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>
>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>> <raji@linux.vnet.ibm.com> wrote:
>>>
>>>
>>>
>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>
>>>>
>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>
>>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>
>>>>
>>>>
>>>> Yes, I think so.
>>>>
>>>
>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>> name changed as s_sincosf.h.  Thanks for the review.
>>>
>>
>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>
>> Generic:
>>
>>      "max": 276.971,
>>      "min": 10.813,
>>      "mean": 29.3755
>>
>> SSE2:
>>
>>      "max": 138.795,
>>      "min": 11.686,
>>      "mean": 22.9463
>>
>> The SSE2 is 28% faster.  Do they use the same algorithm?
>
>
> Yes, they are same. One small difference is generic version calls
> reduced_sin() and reduced_cos() whereas asm version handles both in the same
> branch for reconstruction.
>

Can generic version do the same?
  
Rajalakshmi S Jan. 1, 2018, 10:16 a.m. UTC | #6
On 12/19/2017 10:05 PM, H.J. Lu wrote:
> On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>>
>>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>>> <raji@linux.vnet.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>>
>>>>>
>>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>>
>>>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>>
>>>>>
>>>>>
>>>>> Yes, I think so.
>>>>>
>>>>
>>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>>> name changed as s_sincosf.h.  Thanks for the review.
>>>>
>>>
>>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>>
>>> Generic:
>>>
>>>       "max": 276.971,
>>>       "min": 10.813,
>>>       "mean": 29.3755
>>>
>>> SSE2:
>>>
>>>       "max": 138.795,
>>>       "min": 11.686,
>>>       "mean": 22.9463
>>>
>>> The SSE2 is 28% faster.  Do they use the same algorithm?
>>
>>
>> Yes, they are same. One small difference is generic version calls
>> reduced_sin() and reduced_cos() whereas asm version handles both in the same
>> branch for reconstruction.
>>
> 
> Can generic version do the same?

I don't see any difference in mean value with that change.

> 
> 
>
  

Patch

From 9ba68929320c3bdb50b3aadc5d024318362fbfba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 15 Dec 2017 12:08:22 -0800
Subject: [PATCH] Add s_sincosf_data.c

Add s_sincosf_data.c to data between sinf, cosf and sincosf.

Tested on x86-64.  It reduced the read-only data size by 352 bytes.

	* math/Makefile (type-float-routines): Add s_sincosf_data.
	* sysdeps/ieee754/flt-32/math_config.h (SINCOSF_PIO2_TABLE):
	New.
	(SINCOSF_INVPIO4_TABLE): Likewise.
	(SINCOSF_DOUBLE_ONES): Likewise.
	(__sincosf_data): Likewise.
	* sysdeps/ieee754/flt-32/s_sincos.h: Include "math_config.h".
	(pio2_table): Replaced with __sincosf_data.
	(invpio4_table): Likewise.
	(ones): Likewise.
	* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file.
---
 math/Makefile                           |  2 +-
 sysdeps/ieee754/flt-32/math_config.h    | 11 +++++++++
 sysdeps/ieee754/flt-32/s_sincos.h       | 26 ++++-----------------
 sysdeps/ieee754/flt-32/s_sincosf_data.c | 41 +++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 22 deletions(-)
 create mode 100644 sysdeps/ieee754/flt-32/s_sincosf_data.c

diff --git a/math/Makefile b/math/Makefile
index cba9ce9405..8762f20059 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -120,7 +120,7 @@  type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
 # float support
 type-float-suffix := f
 type-float-routines := k_rem_pio2f math_errf e_exp2f_data e_logf_data	\
-		       e_log2f_data e_powf_log2_data
+		       e_log2f_data e_powf_log2_data s_sincosf_data
 
 # _Float128 support
 type-float128-suffix := f128
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index e5a830b442..c2bc72805c 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -161,4 +161,15 @@  extern const struct powf_log2_data
   double poly[POWF_LOG2_POLY_ORDER];
 } __powf_log2_data attribute_hidden;
 
+/* Shared between sinf, cosf and sincosf.  */
+#define SINCOSF_PIO2_TABLE	6
+#define SINCOSF_INVPIO4_TABLE	8
+#define SINCOSF_DOUBLE_ONES	2
+extern const struct sincosf_data
+{
+  double pio2_table[SINCOSF_PIO2_TABLE];
+  double invpio4_table[SINCOSF_INVPIO4_TABLE];
+  double ones[SINCOSF_DOUBLE_ONES];
+} __sincosf_data attribute_hidden;
+
 #endif
diff --git a/sysdeps/ieee754/flt-32/s_sincos.h b/sysdeps/ieee754/flt-32/s_sincos.h
index b0110fc2af..89855c757f 100644
--- a/sysdeps/ieee754/flt-32/s_sincos.h
+++ b/sysdeps/ieee754/flt-32/s_sincos.h
@@ -16,6 +16,8 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "math_config.h"
+
 /* Chebyshev constants for cos, range -PI/4 - PI/4.  */
 static const double C0 = -0x1.ffffffffe98aep-2;
 static const double C1 =  0x1.55555545c50c7p-5;
@@ -48,27 +50,9 @@  static const double inv_PI_4 = 0x1.45f306dc9c883p+0; /* 4/PI.  */
 #define FLOAT_EXPONENT_SHIFT 23
 #define FLOAT_EXPONENT_BIAS 127
 
-static const double pio2_table[] = {
-  0 * M_PI_2,
-  1 * M_PI_2,
-  2 * M_PI_2,
-  3 * M_PI_2,
-  4 * M_PI_2,
-  5 * M_PI_2
-};
-
-static const double invpio4_table[] = {
-  0x0p+0,
-  0x1.45f306cp+0,
-  0x1.c9c882ap-28,
-  0x1.4fe13a8p-58,
-  0x1.f47d4dp-85,
-  0x1.bb81b6cp-112,
-  0x1.4acc9ep-142,
-  0x1.0e4107cp-169
-};
-
-static const double ones[] = { 1.0, -1.0 };
+#define pio2_table	__sincosf_data.pio2_table
+#define invpio4_table	__sincosf_data.invpio4_table
+#define ones		__sincosf_data.ones
 
 /* Compute the sine value using Chebyshev polynomials where
    THETA is the range reduced absolute value of the input
diff --git a/sysdeps/ieee754/flt-32/s_sincosf_data.c b/sysdeps/ieee754/flt-32/s_sincosf_data.c
new file mode 100644
index 0000000000..d999ac5dfc
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/s_sincosf_data.c
@@ -0,0 +1,41 @@ 
+/* Shared data between sinf, cosf and sincosf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "math_config.h"
+
+const struct sincosf_data __sincosf_data = {
+  .pio2_table = {
+    0 * M_PI_2,
+    1 * M_PI_2,
+    2 * M_PI_2,
+    3 * M_PI_2,
+    4 * M_PI_2,
+    5 * M_PI_2
+  },
+  .invpio4_table = {
+    0x0p+0,
+    0x1.45f306cp+0,
+    0x1.c9c882ap-28,
+    0x1.4fe13a8p-58,
+    0x1.f47d4dp-85,
+    0x1.bb81b6cp-112,
+    0x1.4acc9ep-142,
+    0x1.0e4107cp-169
+  },
+  .ones = { 1.0, -1.0 }
+};
-- 
2.14.3