[1/3] New generic sincosf

Message ID	CAMe9rOq6PZ17V-sMJm50Lxw8Ko+4b37fb8n6JDP_F+m8iSG-Vg@mail.gmail.com
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org MIME-Version: 1.0 In-Reply-To: <CAMe9rOq+-Vd=VOku-gbH=xSUd00rD=Sm6MhCvD6Sf26rMJ02hg@mail.gmail.com> References: <1513143801-3290-1-git-send-email-raji@linux.vnet.ibm.com> <1513143801-3290-2-git-send-email-raji@linux.vnet.ibm.com> <alpine.DEB.2.20.1712150029250.5909@digraph.polyomino.org.uk> <CAMe9rOq3VHYR5prK-oHLB+YjHEJXKU8+gCU_XTsOX12n3RvcQA@mail.gmail.com> <alpine.DEB.2.20.1712151408290.3687@digraph.polyomino.org.uk> <CAMe9rOo-gR0XUuZ2oxEwMj71hukysA75XEhqmsfnx_x414ieqw@mail.gmail.com> <alpine.DEB.2.20.1712151547280.13157@digraph.polyomino.org.uk> <CAMe9rOqD=yXstUu1QZ8zBn268UsQL_iC-VxEntpYaKn_kbwcCA@mail.gmail.com> <alpine.DEB.2.20.1712151620250.13157@digraph.polyomino.org.uk> <CAMe9rOq+-Vd=VOku-gbH=xSUd00rD=Sm6MhCvD6Sf26rMJ02hg@mail.gmail.com> From: "H.J. Lu" <hjl.tools@gmail.com> Date: Fri, 15 Dec 2017 12:34:20 -0800 Message-ID: <CAMe9rOq6PZ17V-sMJm50Lxw8Ko+4b37fb8n6JDP_F+m8iSG-Vg@mail.gmail.com> Subject: Re: [PATCH 1/3] New generic sincosf To: Joseph Myers <joseph@codesourcery.com> Cc: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>, GNU C Library <libc-alpha@sourceware.org> Content-Type: multipart/mixed; boundary="001a11477800fc3b22056066ece6"

Message ID

CAMe9rOq6PZ17V-sMJm50Lxw8Ko+4b37fb8n6JDP_F+m8iSG-Vg@mail.gmail.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
MIME-Version: 1.0
In-Reply-To: <CAMe9rOq+-Vd=VOku-gbH=xSUd00rD=Sm6MhCvD6Sf26rMJ02hg@mail.gmail.com>
References: <1513143801-3290-1-git-send-email-raji@linux.vnet.ibm.com>
	<1513143801-3290-2-git-send-email-raji@linux.vnet.ibm.com>
	<alpine.DEB.2.20.1712150029250.5909@digraph.polyomino.org.uk>
	<CAMe9rOq3VHYR5prK-oHLB+YjHEJXKU8+gCU_XTsOX12n3RvcQA@mail.gmail.com>
	<alpine.DEB.2.20.1712151408290.3687@digraph.polyomino.org.uk>
	<CAMe9rOo-gR0XUuZ2oxEwMj71hukysA75XEhqmsfnx_x414ieqw@mail.gmail.com>
	<alpine.DEB.2.20.1712151547280.13157@digraph.polyomino.org.uk>
	<CAMe9rOqD=yXstUu1QZ8zBn268UsQL_iC-VxEntpYaKn_kbwcCA@mail.gmail.com>
	<alpine.DEB.2.20.1712151620250.13157@digraph.polyomino.org.uk>
	<CAMe9rOq+-Vd=VOku-gbH=xSUd00rD=Sm6MhCvD6Sf26rMJ02hg@mail.gmail.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 15 Dec 2017 12:34:20 -0800
Message-ID: <CAMe9rOq6PZ17V-sMJm50Lxw8Ko+4b37fb8n6JDP_F+m8iSG-Vg@mail.gmail.com>
Subject: Re: [PATCH 1/3] New generic sincosf
To: Joseph Myers <joseph@codesourcery.com>
Cc: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>,
	GNU C Library <libc-alpha@sourceware.org>
Content-Type: multipart/mixed; boundary="001a11477800fc3b22056066ece6"

Commit Message

H.J. Lu Dec. 15, 2017, 8:34 p.m. UTC

  On Fri, Dec 15, 2017 at 8:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 8:24 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> On Fri, Dec 15, 2017 at 7:48 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>>> > On Fri, 15 Dec 2017, H.J. Lu wrote:
>>> >
>>> >> > glibc is build with -fmerge-all-constants, so constants should already be
>>> >> > shared at link time; making constants hidden would be relevant only for
>>> >> > arrays, not for individual floating-point numbers (and if the individual
>>> >> > constants aren't put in appropriate sections with link-time merging, it's
>>> >> > probably because not doing so is more efficient on a particular
>>> >> > architecture).
>>> >>
>>> >> What do you suggest for x86-64 to avoid array duplication?
>>> >
>>> > Well, a followup patch to refactor the arrays into hidden variables would
>>> > be reasonable once the sincosf patch is in - but it would need
>>> > benchmarking to make sure it doesn't adversely affect performance.
>>>
>>> Do you have any suspicions to indicate that hidden array may be
>>> slower than local one to access?
>>
>> On some architectures, GCC uses section anchors for access to variables
>> defined in the same source file.  In such cases, moving the arrays to
>> being hidden and defined separately means that the address of each array
>> needs to be loaded separately because the offsets between different
>> variables are no longer known when the source files using them are being
>> compiled, whereas if defined in each source file, a function accessing
>> multiple arrays could load the anchor address and them compute all the
>> other addresses with small offsets from it.
>
> This won't show up on x86.
>
>> You should be able to avoid that problem by putting all the data in a
>> single structure, as done in e_exp2f_data.c, rather than separate hidden
>> variables.
>>
>
> Good to know.
>
> I have no objections now.  I will prepare a patch after s_sincosf.c is checked
> in.
>

One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
to  sysdeps/ieee754/flt-32/s_sincosf.h?

I am enclosing a patch to add sysdeps/ieee754/flt-32/s_sincosf_data.c.
It saves 352 bytes on x86-64.

H.J.

Comments

Joseph Myers Dec. 15, 2017, 8:50 p.m. UTC | #1

On Fri, 15 Dec 2017, H.J. Lu wrote:

> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
> to  sysdeps/ieee754/flt-32/s_sincosf.h?

Yes, I think so.

Rajalakshmi S Dec. 16, 2017, 8:52 a.m. UTC | #2

On 12/16/2017 02:20 AM, Joseph Myers wrote:
> On Fri, 15 Dec 2017, H.J. Lu wrote:
> 
>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
> 
> Yes, I think so.
> 

Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
name changed as s_sincosf.h.  Thanks for the review.

H.J. Lu Dec. 16, 2017, 2:43 p.m. UTC | #3

On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>
>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>
>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>
>>
>> Yes, I think so.
>>
>
> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
> name changed as s_sincosf.h.  Thanks for the review.
>

I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:

Generic:

    "max": 276.971,
    "min": 10.813,
    "mean": 29.3755

SSE2:

    "max": 138.795,
    "min": 11.686,
    "mean": 22.9463

The SSE2 is 28% faster.  Do they use the same algorithm?

Rajalakshmi S Dec. 18, 2017, 3:26 a.m. UTC | #4

On 12/16/2017 08:13 PM, H.J. Lu wrote:
> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>
>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>
>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>
>>>
>>> Yes, I think so.
>>>
>>
>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>> name changed as s_sincosf.h.  Thanks for the review.
>>
> 
> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
> 
> Generic:
> 
>      "max": 276.971,
>      "min": 10.813,
>      "mean": 29.3755
> 
> SSE2:
> 
>      "max": 138.795,
>      "min": 11.686,
>      "mean": 22.9463
> 
> The SSE2 is 28% faster.  Do they use the same algorithm?

Yes, they are same. One small difference is generic version calls 
reduced_sin() and reduced_cos() whereas asm version handles both in the 
same branch for reconstruction.

> 
>

H.J. Lu Dec. 19, 2017, 4:35 p.m. UTC | #5

On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
>
>
> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>
>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>> <raji@linux.vnet.ibm.com> wrote:
>>>
>>>
>>>
>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>
>>>>
>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>
>>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>
>>>>
>>>>
>>>> Yes, I think so.
>>>>
>>>
>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>> name changed as s_sincosf.h.  Thanks for the review.
>>>
>>
>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>
>> Generic:
>>
>>      "max": 276.971,
>>      "min": 10.813,
>>      "mean": 29.3755
>>
>> SSE2:
>>
>>      "max": 138.795,
>>      "min": 11.686,
>>      "mean": 22.9463
>>
>> The SSE2 is 28% faster.  Do they use the same algorithm?
>
>
> Yes, they are same. One small difference is generic version calls
> reduced_sin() and reduced_cos() whereas asm version handles both in the same
> branch for reconstruction.
>

Can generic version do the same?

Rajalakshmi S Jan. 1, 2018, 10:16 a.m. UTC | #6

On 12/19/2017 10:05 PM, H.J. Lu wrote:
> On Sun, Dec 17, 2017 at 7:26 PM, Rajalakshmi Srinivasaraghavan
> <raji@linux.vnet.ibm.com> wrote:
>>
>>
>> On 12/16/2017 08:13 PM, H.J. Lu wrote:
>>>
>>> On Sat, Dec 16, 2017 at 12:52 AM, Rajalakshmi Srinivasaraghavan
>>> <raji@linux.vnet.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/16/2017 02:20 AM, Joseph Myers wrote:
>>>>>
>>>>>
>>>>> On Fri, 15 Dec 2017, H.J. Lu wrote:
>>>>>
>>>>>> One comment,  should sysdeps/ieee754/flt-32/s_sincos.h be renamed
>>>>>> to  sysdeps/ieee754/flt-32/s_sincosf.h?
>>>>>
>>>>>
>>>>>
>>>>> Yes, I think so.
>>>>>
>>>>
>>>> Pushed as 984ae9967b49830173490a33ae6130880f3f70d9 with the file
>>>> name changed as s_sincosf.h.  Thanks for the review.
>>>>
>>>
>>> I noticed that sysdeps/x86_64/fpu/s_sincosf.S is still faster:
>>>
>>> Generic:
>>>
>>>       "max": 276.971,
>>>       "min": 10.813,
>>>       "mean": 29.3755
>>>
>>> SSE2:
>>>
>>>       "max": 138.795,
>>>       "min": 11.686,
>>>       "mean": 22.9463
>>>
>>> The SSE2 is 28% faster.  Do they use the same algorithm?
>>
>>
>> Yes, they are same. One small difference is generic version calls
>> reduced_sin() and reduced_cos() whereas asm version handles both in the same
>> branch for reconstruction.
>>
> 
> Can generic version do the same?

I don't see any difference in mean value with that change.

> 
> 
>

diff mbox

Patch

From 9ba68929320c3bdb50b3aadc5d024318362fbfba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 15 Dec 2017 12:08:22 -0800
Subject: [PATCH] Add s_sincosf_data.c

Add s_sincosf_data.c to data between sinf, cosf and sincosf.

Tested on x86-64.  It reduced the read-only data size by 352 bytes.

	* math/Makefile (type-float-routines): Add s_sincosf_data.
	* sysdeps/ieee754/flt-32/math_config.h (SINCOSF_PIO2_TABLE):
	New.
	(SINCOSF_INVPIO4_TABLE): Likewise.
	(SINCOSF_DOUBLE_ONES): Likewise.
	(__sincosf_data): Likewise.
	* sysdeps/ieee754/flt-32/s_sincos.h: Include "math_config.h".
	(pio2_table): Replaced with __sincosf_data.
	(invpio4_table): Likewise.
	(ones): Likewise.
	* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file.
---
 math/Makefile                           |  2 +-
 sysdeps/ieee754/flt-32/math_config.h    | 11 +++++++++
 sysdeps/ieee754/flt-32/s_sincos.h       | 26 ++++-----------------
 sysdeps/ieee754/flt-32/s_sincosf_data.c | 41 +++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 22 deletions(-)
 create mode 100644 sysdeps/ieee754/flt-32/s_sincosf_data.c

diff --git a/math/Makefile b/math/Makefile
index cba9ce9405..8762f20059 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -120,7 +120,7 @@  type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
 # float support
 type-float-suffix := f
 type-float-routines := k_rem_pio2f math_errf e_exp2f_data e_logf_data	\
-		       e_log2f_data e_powf_log2_data
+		       e_log2f_data e_powf_log2_data s_sincosf_data
 
 # _Float128 support
 type-float128-suffix := f128
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index e5a830b442..c2bc72805c 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -161,4 +161,15 @@  extern const struct powf_log2_data
   double poly[POWF_LOG2_POLY_ORDER];
 } __powf_log2_data attribute_hidden;
 
+/* Shared between sinf, cosf and sincosf.  */
+#define SINCOSF_PIO2_TABLE	6
+#define SINCOSF_INVPIO4_TABLE	8
+#define SINCOSF_DOUBLE_ONES	2
+extern const struct sincosf_data
+{
+  double pio2_table[SINCOSF_PIO2_TABLE];
+  double invpio4_table[SINCOSF_INVPIO4_TABLE];
+  double ones[SINCOSF_DOUBLE_ONES];
+} __sincosf_data attribute_hidden;
+
 #endif
diff --git a/sysdeps/ieee754/flt-32/s_sincos.h b/sysdeps/ieee754/flt-32/s_sincos.h
index b0110fc2af..89855c757f 100644
--- a/sysdeps/ieee754/flt-32/s_sincos.h
+++ b/sysdeps/ieee754/flt-32/s_sincos.h
@@ -16,6 +16,8 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "math_config.h"
+
 /* Chebyshev constants for cos, range -PI/4 - PI/4.  */
 static const double C0 = -0x1.ffffffffe98aep-2;
 static const double C1 =  0x1.55555545c50c7p-5;
@@ -48,27 +50,9 @@  static const double inv_PI_4 = 0x1.45f306dc9c883p+0; /* 4/PI.  */
 #define FLOAT_EXPONENT_SHIFT 23
 #define FLOAT_EXPONENT_BIAS 127
 
-static const double pio2_table[] = {
-  0 * M_PI_2,
-  1 * M_PI_2,
-  2 * M_PI_2,
-  3 * M_PI_2,
-  4 * M_PI_2,
-  5 * M_PI_2
-};
-
-static const double invpio4_table[] = {
-  0x0p+0,
-  0x1.45f306cp+0,
-  0x1.c9c882ap-28,
-  0x1.4fe13a8p-58,
-  0x1.f47d4dp-85,
-  0x1.bb81b6cp-112,
-  0x1.4acc9ep-142,
-  0x1.0e4107cp-169
-};
-
-static const double ones[] = { 1.0, -1.0 };
+#define pio2_table	__sincosf_data.pio2_table
+#define invpio4_table	__sincosf_data.invpio4_table
+#define ones		__sincosf_data.ones
 
 /* Compute the sine value using Chebyshev polynomials where
    THETA is the range reduced absolute value of the input
diff --git a/sysdeps/ieee754/flt-32/s_sincosf_data.c b/sysdeps/ieee754/flt-32/s_sincosf_data.c
new file mode 100644
index 0000000000..d999ac5dfc
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/s_sincosf_data.c
@@ -0,0 +1,41 @@ 
+/* Shared data between sinf, cosf and sincosf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "math_config.h"
+
+const struct sincosf_data __sincosf_data = {
+  .pio2_table = {
+    0 * M_PI_2,
+    1 * M_PI_2,
+    2 * M_PI_2,
+    3 * M_PI_2,
+    4 * M_PI_2,
+    5 * M_PI_2
+  },
+  .invpio4_table = {
+    0x0p+0,
+    0x1.45f306cp+0,
+    0x1.c9c882ap-28,
+    0x1.4fe13a8p-58,
+    0x1.f47d4dp-85,
+    0x1.bb81b6cp-112,
+    0x1.4acc9ep-142,
+    0x1.0e4107cp-169
+  },
+  .ones = { 1.0, -1.0 }
+};
-- 
2.14.3