From patchwork Mon Mar 12 15:26:00 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
X-Patchwork-Id: 26284
Received: (qmail 112453 invoked by alias); 12 Mar 2018 15:26:06 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 112444 invoked by uid 89); 12 Mar 2018 15:26:05 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
	"sellcey@cavium.com" <sellcey@cavium.com>
CC: nd <nd@arm.com>
Subject: Re: [PATCH 4/6] Remove slow paths from sin/cos
Date: Mon, 12 Mar 2018 15:26:00 +0000
Message-ID: 
 <DB6PR0801MB2053A64A7D76E0A8B449623283D30@DB6PR0801MB2053.eurprd08.prod.outlook.com>
References: 
 <DB6PR0801MB2053407B5FE9C9DF2174313683DE0@DB6PR0801MB2053.eurprd08.prod.outlook.com>,
	<1520632203.6774.151.camel@cavium.com>
In-Reply-To: <1520632203.6774.151.camel@cavium.com>
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DB6PR0801MB1749;
	6:p/D4yO7cCzXS+s4/jU6mxQgjeL28eXzHuZwMOlFNROVTAq/SrpdGQYmmsyKntVt3Bqj9SyukEQzOWQ3s4vafe1RxxblpZxE0EAmopOKEp5GplKLr7WvJznOBHle8giUz9ba6FWao5niHyesBMczsW71gz9Q4TNrIpVeaOpiBV2rEPOqRFzEaE6gJqlCJiUDaFryJVGAtl2oo7jHivFXnQzL7uDQRV29/LcJhv92PskwDrdI6/kbjmEJtBjW8oomAUWs/HTOIIb++Y0KN0uxU5dazfLAp2H0P46Ci/OSJk002tgIb+c/GKiKMlD9E9oX0gTpA+b/QhRySGcub/yolm9d1962gaANkhc/QLrZF3TI94ZtAzt1k3YLTOmpeUBOt;
	5:MOz4CzYmA3fd+v7AVvXYMUU3xv7Fir6EmmTgcFiyuEMA1Lf1SFgwiSWJx021wTimvCUrmmY5ndekxHGMQNIEsfS7FylwNd7ucztgid9xwh0O0yp44TaFHDPy27mgH/UkqXAJIxX6eIMoF9tkZ1SEblTxXHtaw6x1qjlWvLZAojQ=;
	24:peSjKZq4T858suGsCR3Y9GlvMN2FslH4STg+V1JDfLqMTa43y2p+J48bdryqS3xqqUHB45S5Ecn1JCwhHAIy+OGHL0gSzwDGe3Bp9TxNnj4=;
	7:GOEk3a80WNBSiZaMcsWX3ByjdN4Reo6yfZjzsX9yWOyyjMzB2XTpX3sDPWT64xEcobRXH6kmnJTmRAOVFA6al7LKsMD794WN7NmPgF9+nukroB/1AK7C5PbjR1JZigKwYm2DWoqn3/yTRmn0d+rm2loMXGYLCI7RS2H+7tVB/hyofIAkU0fmEVwRlG/H9YopfmHQlHZ4I38wklSMoZE4I3US8inN3o61SFAbL0dMW+AR8s+5OLUr7XC5cI+vDfbd
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 949af891-5327-4623-fe8c-08d5882d8f4e
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
	RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(2017052603328)(7153060)(7193020);
	SRVR:DB6PR0801MB1749;
x-ms-traffictypediagnostic: DB6PR0801MB1749:
authentication-results: spf=none (sender IP is )
	smtp.mailfrom=Wilco.Dijkstra@arm.com;
nodisclaimer: True
x-microsoft-antispam-prvs: 
 <DB6PR0801MB1749C7600977C2729EC6FCCA83D30@DB6PR0801MB1749.eurprd08.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
	RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(93006095)(93001095)(10201501046)(3231220)(944501244)(52105095)(6055026)(6041310)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123558120)(6072148)(201708071742011);
	SRVR:DB6PR0801MB1749; BCL:0; PCL:0; RULEID:;
	SRVR:DB6PR0801MB1749;
x-forefront-prvs: 06098A2863
x-forefront-antispam-report: SFV:NSPM;
	SFS:(10009020)(39380400002)(396003)(346002)(366004)(39860400002)(376002)(51444003)(189003)(199004)(2906002)(106356001)(3660700001)(5660300001)(97736004)(105586002)(66066001)(68736007)(2950100002)(76176011)(7696005)(99286004)(229853002)(110136005)(2900100001)(3280700002)(72206003)(14454004)(25786009)(4326008)(6246003)(478600001)(6506007)(316002)(74316002)(9686003)(26005)(8676002)(55016002)(7736002)(3846002)(81156014)(81166006)(305945005)(8936002)(6116002)(5250100002)(53936002)(102836004)(86362001)(33656002)(6436002)(2501003);
	DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB1749;
	H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None;
	PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
x-microsoft-antispam-message-info: 
 cf4VoseV4J7i8joc2x6LkmJpnA5uiowC5v3FULqVg4hstW7h+yDhwzb6ygoDXHSI7S/mqGRsoyq1Y7cGyeQOBecPoGvjFuQZ6wrhibx5LX7cGYneErJXRd7lKDztI29vnhHavMgy55H67C2rnBsxsJarTJ3+mLScrCDD9Mhk5CmvpCAc6wYnaKjU5/lej6qkQcmnRd0xv44GmHE13A/gv+KVY4JKOrmoy9ApUyhb9kd0zAGnds1p9TwpJoLu0O09rOZkCKothzlfsqd8D04YaWP1I90k9mHm1NV/BZQuSusHQHSn9+IgMXJX9xuJd2iBC8z7/2sgcunvlzUOvvlUkA==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
MIME-Version: 1.0
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 949af891-5327-4623-fe8c-08d5882d8f4e
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Mar 2018 15:26:00.0924
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1749

Steve Ellcey wrote:

> Did this patch get mangled in transit or something?  I could apply
> patches 1 through 3 with no problem (and no fuzzing) to ToT but when I
> got to 4 I get errors.  I may have mangled something on this end but I
> treated this patch just like 1 through 3 and those worked fine.  5 and
> 6 have issues too but I think that is because 4 did not apply cleanly.

Sorry, it looks like a few lines were missing due to cut&paste, here is the full version:

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 3b748821f6e5f817dc234ec7f96d951910299e21..5966282db60224528fea2bf55a05dd4120ab12a9 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -67,11 +67,10 @@
 
    The constants s1, s2, s3, etc. are pre-computed values of 1/3!, 1/5! and so
    on.  The result is returned to LHS and correction in COR.  */
-#define TAYLOR_SIN(xx, a, da, cor) \
+#define TAYLOR_SIN(xx, a, da) \
 ({									      \
   double t = ((POLYNOMIAL (xx)  * (a) - 0.5 * (da))  * (xx) + (da));	      \
   double res = (a) + t;							      \
-  (cor) = ((a) - res) + t;						      \
   res;									      \
 })
 
@@ -145,10 +144,10 @@ static double cslow2 (double x);
 /* Given a number partitioned into X and DX, this function computes the cosine
    of the number by combining the sin and cos of X (as computed by a variation
    of the Taylor series) with the values looked up from the sin/cos table to
-   get the result in RES and a correction value in COR.  */
+   get the result.  */
 static inline double
 __always_inline
-do_cos (double x, double dx, double *corp)
+do_cos (double x, double dx)
 {
   mynumber u;
 
@@ -158,16 +157,13 @@ do_cos (double x, double dx, double *corp)
   u.x = big + fabs (x);
   x = fabs (x) - (u.x - big) + dx;
 
-  double xx, s, sn, ssn, c, cs, ccs, res, cor;
+  double xx, s, sn, ssn, c, cs, ccs, cor;
   xx = x * x;
   s = x + x * xx * (sn3 + xx * sn5);
   c = xx * (cs2 + xx * (cs4 + xx * cs6));
   SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
   cor = (ccs - s * ssn - cs * c) - sn * s;
-  res = cs + cor;
-  cor = (cs - res) + cor;
-  *corp = cor;
-  return res;
+  return cs + cor;
 }
 
 /* A more precise variant of DO_COS.  EPS is the adjustment to the correction
@@ -207,10 +203,10 @@ do_cos_slow (double x, double dx, double eps, double *corp)
 /* Given a number partitioned into X and DX, this function computes the sine of
    the number by combining the sin and cos of X (as computed by a variation of
    the Taylor series) with the values looked up from the sin/cos table to get
-   the result in RES and a correction value in COR.  */
+   the result.  */
 static inline double
 __always_inline
-do_sin (double x, double dx, double *corp)
+do_sin (double x, double dx)
 {
   mynumber u;
 
@@ -219,16 +215,13 @@ do_sin (double x, double dx, double *corp)
   u.x = big + fabs (x);
   x = fabs (x) - (u.x - big);
 
-  double xx, s, sn, ssn, c, cs, ccs, cor, res;
+  double xx, s, sn, ssn, c, cs, ccs, cor;
   xx = x * x;
   s = x + (dx + x * xx * (sn3 + xx * sn5));
   c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6));
   SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
   cor = (ssn + s * ccs - sn * c) + cs * s;
-  res = sn + cor;
-  cor = (sn - res) + cor;
-  *corp = cor;
-  return res;
+  return sn + cor;
 }
 
 /* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
@@ -340,19 +333,19 @@ static double
 __always_inline
 do_sincos (double a, double da, int4 n)
 {
-  double retval, cor;
+  double retval;
 
   if (n & 1)
     /* Max ULP is 0.513.  */
-    retval = do_cos (a, da, &cor);
+    retval = do_cos (a, da);
   else
     {
       double xx = a * a;
       /* Max ULP is 0.501 if xx < 0.01588, otherwise ULP is 0.518.  */
       if (xx < 0.01588)
-	retval = TAYLOR_SIN (xx, a, da, cor);
+	retval = TAYLOR_SIN (xx, a, da);
       else
-	retval = __copysign (do_sin (a, da, &cor), a);
+	retval = __copysign (do_sin (a, da), a);
     }
 
   return (n & 2) ? -retval : retval;
@@ -371,7 +364,7 @@ SECTION
 #endif
 __sin (double x)
 {
-  double xx, t, a, da, cor;
+  double xx, t, a, da;
   mynumber u;
   int4 k, m, n;
   double retval = 0;
@@ -401,7 +394,7 @@ __sin (double x)
   else if (k < 0x3feb6000)
     {
       /* Max ULP is 0.548.  */
-      retval = __copysign (do_sin (x, 0, &cor), x);
+      retval = __copysign (do_sin (x, 0), x);
     }				/*   else  if (k < 0x3feb6000)    */
 
 /*----------------------- 0.855469  <|x|<2.426265  ----------------------*/
@@ -409,7 +402,7 @@ __sin (double x)
     {
       t = hp0 - fabs (x);
       /* Max ULP is 0.51.  */
-      retval = __copysign (do_cos (t, hp1, &cor), x);
+      retval = __copysign (do_cos (t, hp1), x);
     }				/*   else  if (k < 0x400368fd)    */
 
 #ifndef IN_SINCOS
@@ -422,8 +415,10 @@ __sin (double x)
 
 /* --------------------105414350 <|x| <2^1024------------------------------*/
   else if (k < 0x7ff00000)
-    retval = reduce_and_compute (x, false);
-
+    {
+      n = __branred (x, &a, &da);
+      retval = do_sincos (a, da, n);
+    }
 /*--------------------- |x| > 2^1024 ----------------------------------*/
   else
     {
@@ -455,7 +450,7 @@ SECTION
 #endif
 __cos (double x)
 {
-  double y, xx, cor, a, da;
+  double y, xx, a, da;
   mynumber u;
   int4 k, m, n;
 
@@ -476,7 +471,7 @@ __cos (double x)
   else if (k < 0x3feb6000)
     {				/* 2^-27 < |x| < 0.855469 */
       /* Max ULP is 0.51. */
-      retval = do_cos (x, 0, &cor);
+      retval = do_cos (x, 0);
     }				/*   else  if (k < 0x3feb6000)    */
 
   else if (k < 0x400368fd)
@@ -488,9 +483,9 @@ __cos (double x)
       /* Max ULP is 0.501 if xx < 0.01588 or 0.518 otherwise.
 	 Range reduction uses 106 bits here which is sufficient.  */
       if (xx < 0.01588)
-	retval = TAYLOR_SIN (xx, a, da, cor);
+	retval = TAYLOR_SIN (xx, a, da);
       else
-	retval = __copysign (do_sin (a, da, &cor), a);
+	retval = __copysign (do_sin (a, da), a);
     }				/*   else  if (k < 0x400368fd)    */
 
 
@@ -503,7 +498,10 @@ __cos (double x)
 
   /* 105414350 <|x| <2^1024 */
   else if (k < 0x7ff00000)
-    retval = reduce_and_compute (x, true);
+    {
+      n = __branred (x, &a, &da);
+      retval = do_sincos (a, da, n + 1);
+    }
 
   else
     {
diff --git a/sysdeps/ieee754/dbl-64/s_sincos.c b/sysdeps/ieee754/dbl-64/s_sincos.c
index 4f032d2e42593ccde22169b374728386dd8fca8e..4335ecbba3c9894e61c087ac970b392fa73abfab 100644
--- a/sysdeps/ieee754/dbl-64/s_sincos.c
+++ b/sysdeps/ieee754/dbl-64/s_sincos.c
@@ -28,37 +28,6 @@
 #define IN_SINCOS 1
 #include "s_sin.c"
 
-/* Consolidated version of reduce_and_compute in s_sin.c that does range
-   reduction only once and computes sin and cos together.  */
-static inline void
-__always_inline
-reduce_and_compute_sincos (double x, double *sinx, double *cosx)
-{
-  double a, da;
-  unsigned int n = __branred (x, &a, &da);
-
-  n = n & 3;
-
-  if (n == 1 || n == 2)
-    {
-      a = -a;
-      da = -da;
-    }
-
-  if (n & 1)
-    {
-      double *temp = cosx;
-      cosx = sinx;
-      sinx = temp;
-    }
-
-  if (a * a < 0.01588)
-    *sinx = bsloww (a, da, x, n);
-  else
-    *sinx = bsloww1 (a, da, x, n);
-  *cosx = bsloww2 (a, da, x, n);
-}
-
 void
 __sincos (double x, double *sinx, double *cosx)
 {
@@ -88,8 +57,11 @@ __sincos (double x, double *sinx, double *cosx)
     }
   if (k < 0x7ff00000)
     {
-      reduce_and_compute_sincos (x, sinx, cosx);
-      return;
+      double a, da;
+      int4 n = __branred (x, &a, &da);
+
+      *sinx = do_sincos (a, da, n);
+      *cosx = do_sincos (a, da, n + 1);
     }
 
   if (isinf (x))