From patchwork Tue Mar 6 21:33:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Steve Ellcey X-Patchwork-Id: 26221 Received: (qmail 27255 invoked by alias); 6 Mar 2018 21:33:26 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 26700 invoked by uid 89); 6 Mar 2018 21:33:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=ux, 19545, u1x, 6* X-HELO: NAM02-BL2-obe.outbound.protection.outlook.com Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Steve.Ellcey@cavium.com; Message-ID: <1520371988.6774.82.camel@cavium.com> Subject: RFC: Proposal for implementing libmvec on aarch64 From: Steve Ellcey Reply-To: sellcey@cavium.com To: libc-alpha Date: Tue, 06 Mar 2018 13:33:08 -0800 Mime-Version: 1.0 X-ClientProxiedBy: CO2PR18CA0057.namprd18.prod.outlook.com (2603:10b6:104:2::25) To DM6PR07MB4571.namprd07.prod.outlook.com (2603:10b6:5:c1::21) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: baef91e4-91b8-4d03-7513-08d583a9e096 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4604075)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(49563074)(7193020); SRVR:DM6PR07MB4571; X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB4571; 3:Mu0xB4cdpdUX7zUduHuhjlahyauBMM7eg6OvPhwVUpUET/gqfpvzV4C4jfmOMMlcXifOPH8Ue6J8Gc4OvTE6hYj+eGObP8xUsMZllaNXH2CCJYh2qfrY+qDuy+1FLM5uMnMyTfD1zcjeUD/fq5VLip2qTnXM++2DxQuttJpRDfYJ9x/WgwUHuQlbHHpQlXFHGqY1v4X90XJQ6kIyQbXcq1CYshndBMirP0cu5MYLrKqmd7AhV6P/BAP+r5r7VYNd; 25:wSF2dKaM29dj4/Mzy3amCn1fwQAy/pR1UN447X7KfiGOE0M97jGS1QQsikWHp3JwiJ848drDUHAZS6D9lN71y2uZHoZote2cxfxv80n8FC9UlH1KqMqm2NGQi2egHRwYbbukF6XpzqZyU40mPiHWjztiMZEC1k8H3NAkzZrYCophBgs8kDJKsuXGee6HoFJPVlhVo7TxVTSkBrcVp4URyYKH08cReQ8f0HkZk4cOJwo6tuJ7Pnib3znm9qG3YqHdgdDhmTXnskQNvVyflc/wUdJitz/9JRY+7N6Kt5aC75wSWqzTao73oWizYsPomO9WT76yfAL5cymlflEe/Enuiw==; 31:KAWEg34KXjfggbv9yOUhCo6AdCz++31Q/JXEcDJPcO0EHnXNiMA82cIexApjqRuw7y2BimOVCjAZpOTNTPq6Ib6QMCi4QKsvP5AzR2NdOi7Zve1zX6xrjz0AozvVMxBVBPMFUnSeLRgQ3hV+VvxOVa274r44tVoQ68idIwo4gq/qxzVCzBvJuMk7DBY85F5OxMzIy9TI84MYfbX635Oh8AAh8sLtBZ5px4O352/LUlo= X-MS-TrafficTypeDiagnostic: DM6PR07MB4571: X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB4571; 20:2R5njgdXshznXpY0CWjG6VIenUckX0DXAYtz6/gVinWD81hwfaFaPkm2x9qnBpfjMNSUV/v0ve2iV7MMgfmhLr5gPYiJ1ND8xuPFFmR06KX1vpexes/k3wMwgmTYt/AK+7qEYWJrhMkdtChkxNzaNcGx0VI4fwsji37eFlqU8SeGCq7zsWziJSlViy3OHxoDeJnF1aZLuRakmmHH0TtcmLGq1O3U4VykysqL//UqYjl6CI+fBHKFKc9J44e5pgpprdARDxv/sC8JB6RkQX35xFjpVFAtBR5MISVyTzb8I8Zp30Ar6t9IfIyAj4y1ozmv7KrDtEhNwK4ZmLM7u0Dxdj62e0GuvZjo3MoE7KoGuD0bECW3Q/UmNi7oRDbA5J+AZt6fu1CWvTUyKutTcH8FBbiYij8qHCrLrkbow0xc+5aDMklScgQTGzwDm6ktR9NpO/9VXV58IuiP0ZeJBFaB9ugBDlK/ZjSu68Ey6Qpn+n++jdH0uB9WumFK1A5CzqMG; 4:lUF5nvElKh90NanjUBoW21OYNGQ+2ylJkqRa4oL1EijcpabFP9vXUimcr1Nuhc+ixaKcUFtkJ0eru2fhNWr7/LWZQvm0tb/6Vm3VvTSgWcB8gD/bNkMiHXaI2/p+ZaZkHdNWzkHTUvZEpIgnsMOt+to9tUtvrSibREyPgrxGln+RYLpFDL10BvkYy+yYrZX/7y9fpT9vRQ6KZrwcdHcdt1WFeTFWVBwNorCj/D/H8q7ANa61fbq8RgX20W7oTccHqlIoZjnuPaVe4rq9P5OGSw== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(102415395)(6040501)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3231220)(944501244)(52105095)(3002001)(6041288)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123560045)(20161123562045)(6072148)(201708071742011); SRVR:DM6PR07MB4571; BCL:0; PCL:0; RULEID:; SRVR:DM6PR07MB4571; X-Forefront-PRVS: 06036BD506 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(346002)(396003)(39860400002)(39380400002)(366004)(376002)(199004)(189003)(6916009)(186003)(25786009)(50226002)(7736002)(68736007)(105586002)(3450700001)(3846002)(6116002)(6486002)(6666003)(305945005)(81166006)(81156014)(26005)(16526019)(4610100001)(103116003)(69596002)(66066001)(84326002)(386003)(316002)(6506007)(33964004)(43066004)(568964002)(2906002)(53936002)(16586007)(478600001)(4810100001)(53416004)(5660300001)(72206003)(8936002)(6512007)(8676002)(36756003)(97736004)(106356001)(52116002)(5890100001)(2476003)(99106002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR07MB4571; H:sellcey-dt.caveonetworks.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DM6PR07MB4571; 23:oM0bzXrLrU33DzzX2H5xiGj3UlXCtgc6ZNqiT9xPx?= =?us-ascii?Q?4ZVhgSErgxCHwZWsWLk08ndT3/yaAcxE1rFC/Uve1Z2OIRaLKrmLvUxiGTjb?= =?us-ascii?Q?5MDE3DlJVVJ3dBdA6qoSpbS/mAvdbQ43wK5v4E7Q87q0nM2VCHrjrU7WSRwj?= =?us-ascii?Q?cXepItIOk0Q8crvEUJNJXzQlCmHCC78WUgua/3VawhNHHiOOv8bJoF9AKUDS?= =?us-ascii?Q?2iubeeozkqPlXs6/qnblOsRAcCKz80uPbEF2Cg2mW+wWKjwhITRz4roBb677?= =?us-ascii?Q?kpY6XG9+yxZprRBafZFnpha203BYE8fkN/hRky6ze3GYED5JypGCDNoGZfHz?= =?us-ascii?Q?AeKHYOq0hrGfjm2V1Jdig2KYTX96H0op7zzyjxS61b7bzwQ4DMEJqWBJ9I32?= =?us-ascii?Q?4F0vZdJMNkDKzNhDCYAnhx8C7ul4FSmFee63i58cIPwp6sJltilg/oQ5uwqh?= =?us-ascii?Q?KMZjcqlvWyWtalR3+T7YYLvfIgUQTVgY3ciUJFno3KYzUtHcFrdc67NWbHOJ?= =?us-ascii?Q?mJNxn8St0SN7vu2eHHq/YMv0W9M+V9A6+IVZyiixzeYtMmimzFZaX1/RFqxl?= =?us-ascii?Q?GPLCcYbcXKRrljJ6v7O7CQvtPzlbxsLqwePy26QnNI7C9axjKqUjhJ3kWQf0?= =?us-ascii?Q?tyj7p/ec14p2ZgUKOWApKGVZixyJJ2mj3dTFeWqsAq3zIn/TU2ZZMMJi7Z2b?= =?us-ascii?Q?sHmBIdmPYIFEu643xI2meBr5Fe7VYaLQO8mM1YC7J+8yPZ6hFr6xM9rdNvRD?= =?us-ascii?Q?XrRev9uRVt/6FSRpTaq0iZpdj8okphmwTNptGvI/Tmn1BF0Y26CJZ03mX9C8?= =?us-ascii?Q?tZH5De+T3tyUExqX0odyNoARm08G55YrtR0r3k+XYM7pJNtT0MTUZm0nqwgv?= =?us-ascii?Q?Kfdtub7j2dSpBIV7nNqIHOxq3MgGHjprs4YeYhXBDyVRHxBI+kff6o86Q6LT?= =?us-ascii?Q?JM2Fus37qFVuUsVoDMsgfJXcTCDI/BYF0xNRaP0qwdLXoSqnvJrwExSCrnJC?= =?us-ascii?Q?GdJUwgsx7NN6/8nyK7UDHi7tts/UO7rgsY0e1ECjzLIRuUhHcBzB+NZKEjc5?= =?us-ascii?Q?pORthttVD2fcFBxj2HK/8sBaJxz1B2yY75ZBCiE8gAgKXVmuWAeO3uG1ZaAu?= =?us-ascii?Q?dQGrmCAqySpldtdlrzz6s5uXPdOrvFj4xPoZx2PKBVPBsvqtWWqR+yInfgzg?= =?us-ascii?Q?JXXwlEhfSXDeGQzzGo3D9dbhNJi5LZzPF4SBf7Bn3IYRsiPOT1Q4gAdEo8Nw?= =?us-ascii?Q?itGpPGsXkL55BM5XNk=3D?= X-Microsoft-Antispam-Message-Info: dalNY5vUNxHdRJn6MeW8tvohcZdDMCUmDBoLyKyZBlHOhB5Hqe47YV0lH8ao+C/ZX6g7gbdksSxLgzfUmTVyW2fk0vShLBqyvR0ONZpXKgObpt/1vKdK5WY3nYQmY9UpcJ9jdofnlmoGS0rqu2JkslIQ+MuJtSVPLk6Pgo1HMDeDslJtNE8BooON830KMLeV X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB4571; 6:5Vj9rjt5WffV5MeuB2I1Mh/YM9/TJ8yzlZdYZocqD4dWVHd7nLUYnZcofl4snRB38xBA7WolBdbv6Q8FPE1pMU8SoSPyjQPIBkx72pKQ+P7cpZr1C31AzCsGN5sDf6fTQ9ry21hirWfSHAY6FbjwvUjlWBObRl+lYp9ZUInILMrnkzcU9g6B+dcDAVbJppXf50xBJ0Jlu8+wWUCZYBcxdVfDeYTgthh6fBqziLAFf9jSKC0ukiyTHfkIR2+k04GsmyaePvKCDYD3vAXyGCSDCYHX9iBYYzhNbuoLTYQeIuEOnZ9M+Xu1o54sOd2TDBvnPNfLU6S22q5GhrjJqGWIhIRKLO7M43U1wVZmiBxbTL8=; 5:99vzE96O3yg8fSGN0NPWmTxeVn7s5vljfHTJvs2PHgeC36QFN23xGUY2IpDqUk00efZiG2Juz8CXmLFbZtyjc60o/az1JYbFbM/bX8UunRXY2Y9Mx1b+cGEKPDmjlSHWY6QWzFB6yDYEsef/7EMuxrroXofVKJPr4gI98Tu7Ce0=; 24:p9H0yHh5wWjGXWrxrZ6/H3tF1MMIPBAt47Tx3sP2fCqBevrHsy9/z6IZiZWRJWuINQA6bwKMDqiEP6iscifh9ZierI5d2rzm+r/lzLgPJTk=; 7:TnLDezzRaJqdZSTn+UV2TS2/THIlRZO4T5olnNsQsS34td0UFhfROtD7GGAWy8Jj2ZeXrn+qbFNKGv7hu8LevbKavEnPn4814/TCRl7LAIr+vw37oTc2scu0hO+NPRXyiGbg9dCC1kP8aaVBO0DcKlLou46kcAxRJofIOIdlnUe7NpZc9FJX494E0k2887/CkQdpJCykia6B0Pp2i9TDs+8WPsN/JC4B1OBoXeodjluN0Atpq3LNqiHihki8L3T9 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2018 21:33:18.1274 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: baef91e4-91b8-4d03-7513-08d583a9e096 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR07MB4571 This is an RFC for an approach to implementing libmvec vector math routines for aarch64.  There are three patches attached to this email, one is a GCC patch that allows aarch64 to generate calls to vector functions that are marked with __attribute__ ((simd)), the second is the infrastructure part of glibc changes that mark sin and cos with this attribute in the header files, and the third is an implementation of vector sin and cos by adding ifdefs to the existing scalar sin/cos functions in sysdeps/ieee754/dbl-64/s_sin.c. I have several questions that I would like to get some feedback on: Does the approach of modifying the existing scalar math routines to also work on vector types, as opposed to having completely seperate vector routines sound like a good approach?  I like it because it reduced the amount of duplicated code between scalar and vector functions and should keep the accuracy of scalar and vector routines in sync.  And since it is written in C it can also be shared among multiple platforms.  The main downside right now is the slow paths are still mostly handled serially and so I am only seeing about a 5% improvement in a test that does sin calls on an array of doubles.  I think more changes could be done to improve the amount of parallelism in the vector versions but this is a start. Perhaps we could remove some of the slow paths in the vector versions like we do with the sincos function. The current implementation does not build because when linking libmvec it gets undefined externals for __branred, __docos, __dubsin, __mpcos, __mpsin, and __sincostab.  How should this be handled?  Should they be exported from libm so that libmvec can call them or should they be  copied into libmvec so that they don't have to be externally visible? Are there any issues with making libmvec depend on libm? The vector ABI for aarch64 is not yet defined, I believe ARM is working on that.  As you can see from the GCC patch I am just using the same approach as x86 and just changed the vecsize_mangle variable to 'n' for 128 bit vectors on Aarch64. Comments? Steve Ellcey sellcey@cavium.com diff --git a/sysdeps/aarch64/fpu/libmvec_d_sincos2.c b/sysdeps/aarch64/fpu/libmvec_d_sincos2.c index e69de29..f92291f 100644 --- a/sysdeps/aarch64/fpu/libmvec_d_sincos2.c +++ b/sysdeps/aarch64/fpu/libmvec_d_sincos2.c @@ -0,0 +1,21 @@ + +/* We need the includes before the defines so that the define of __sin + and __cos do not mess up the bits/mathcalls.h declaration of __sin + and __cos. */ + +#include +#include "endian.h" +#include "mydefs.h" +#include "usncs.h" +#include "MathLib.h" +#include +#include +#include +#include + +#define VEC_SIZE 2 +#define OP_TYPE __Float64x2_t +#define __sin _ZGVnN2v_sin +#define __cos _ZGVnN2v_cos + +#include "sysdeps/ieee754/dbl-64/s_sin.c" diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c index 8c589cb..e1db599 100644 --- a/sysdeps/ieee754/dbl-64/s_sin.c +++ b/sysdeps/ieee754/dbl-64/s_sin.c @@ -55,6 +55,18 @@ #include #include +#ifndef VEC_SIZE +# define VEC_SIZE 1 +#endif + +#if VEC_SIZE != 1 && VEC_SIZE != 2 +# error "This file must be updated to support other vector lengths." +#endif + +#ifndef OP_TYPE +# define OP_TYPE double +#endif + /* Helper macros to compute sin of the input values. */ #define POLYNOMIAL2(xx) ((((s5 * (xx) + s4) * (xx) + s3) * (xx) + s2) * (xx)) @@ -67,13 +79,25 @@ The constants s1, s2, s3, etc. are pre-computed values of 1/3!, 1/5! and so on. The result is returned to LHS and correction in COR. */ -#define TAYLOR_SIN(xx, a, da, cor) \ -({ \ - double t = ((POLYNOMIAL (xx) * (a) - 0.5 * (da)) * (xx) + (da)); \ - double res = (a) + t; \ - (cor) = ((a) - res) + t; \ - res; \ -}) +static inline double +__always_inline +taylor_sin_scalar (double xx, double a, double da, double *cor) +{ + double t = (POLYNOMIAL (xx) * a - 0.5 * da) * xx + da; + double res = a + t; + *cor = (a - res) + t; + return res; +} + +static inline OP_TYPE +__always_inline +taylor_sin_vector (OP_TYPE xx, OP_TYPE a, OP_TYPE da, OP_TYPE *cor) +{ + OP_TYPE t = (POLYNOMIAL (xx) * a - 0.5 * da) * xx + da; + OP_TYPE res = a + t; + *cor = (a - res) + t; + return res; +} /* This is again a variation of the Taylor series expansion with the term x^3/3! expanded into the following for better accuracy: @@ -126,10 +150,11 @@ static const double static const double t22 = 0x1.8p22; -void __dubsin (double x, double dx, double w[]); -void __docos (double x, double dx, double w[]); -double __mpsin (double x, double dx, bool reduce_range); -double __mpcos (double x, double dx, bool reduce_range); +extern void __dubsin (double x, double dx, double w[]); +extern void __docos (double x, double dx, double w[]); +extern double __mpsin (double x, double dx, bool reduce_range); +extern double __mpcos (double x, double dx, bool reduce_range); + static double slow (double x); static double slow1 (double x); static double slow2 (double x); @@ -148,7 +173,7 @@ static double cslow2 (double x); get the result in RES and a correction value in COR. */ static inline double __always_inline -do_cos (double x, double dx, double *corp) +do_cos_scalar (double x, double dx, double *corp) { mynumber u; @@ -170,6 +195,45 @@ do_cos (double x, double dx, double *corp) return res; } +static inline OP_TYPE +__always_inline +do_cos_vector (OP_TYPE x, OP_TYPE dx, OP_TYPE *corp) +{ +#if VEC_SIZE == 1 + mynumber u; + if (x < 0) + dx = -dx; + u.x = big + fabs (x); + x = fabs (x) - (u.x - big) + dx; +#else + mynumber u0, u1; + if (x[0] < 0) + dx[0] = -dx[0]; + if (x[1] < 0) + dx[1] = -dx[1]; + u0.x = big + fabs (x[0]); + u1.x = big + fabs (x[1]); + x[0] = fabs (x[0]) - (u0.x - big) + dx[0]; + x[1] = fabs (x[1]) - (u1.x - big) + dx[1]; +#endif + + OP_TYPE xx, s, sn, ssn, c, cs, ccs, res, cor; + xx = x * x; + s = x + x * xx * (sn3 + xx * sn5); + c = xx * (cs2 + xx * (cs4 + xx * cs6)); +#if VEC_SIZE == 1 + SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); +#else + SINCOS_TABLE_LOOKUP (u0, sn[0], ssn[0], cs[0], ccs[0]); + SINCOS_TABLE_LOOKUP (u1, sn[1], ssn[1], cs[1], ccs[1]); +#endif + cor = (ccs - s * ssn - cs * c) - sn * s; + res = cs + cor; + cor = (cs - res) + cor; + *corp = cor; + return res; +} + /* A more precise variant of DO_COS. EPS is the adjustment to the correction COR. */ static inline double @@ -210,7 +274,7 @@ do_cos_slow (double x, double dx, double eps, double *corp) the result in RES and a correction value in COR. */ static inline double __always_inline -do_sin (double x, double dx, double *corp) +do_sin_scalar (double x, double dx, double *corp) { mynumber u; @@ -231,6 +295,45 @@ do_sin (double x, double dx, double *corp) return res; } +static inline OP_TYPE +__always_inline +do_sin_vector (OP_TYPE x, OP_TYPE dx, OP_TYPE *corp) +{ +#if VEC_SIZE == 1 + mynumber u; + if (x <= 0) + dx = -dx; + u.x = big + fabs (x); + x = fabs (x) - (u.x - big); +#else + mynumber u0, u1; + if (x[0] <= 0) + dx[0] = -dx[0]; + if (x[1] <= 0) + dx[1] = -dx[1]; + u0.x = big + fabs (x[0]); + u1.x = big + fabs (x[1]); + x[0] = fabs (x[0]) - (u0.x - big); + x[1] = fabs (x[1]) - (u1.x - big); +#endif + + OP_TYPE xx, s, sn, ssn, c, cs, ccs, cor, res; + xx = x * x; + s = x + (dx + x * xx * (sn3 + xx * sn5)); + c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6)); +#if VEC_SIZE == 1 + SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); +#else + SINCOS_TABLE_LOOKUP (u0, sn[0], ssn[0], cs[0], ccs[0]); + SINCOS_TABLE_LOOKUP (u1, sn[1], ssn[1], cs[1], ccs[1]); +#endif + cor = (ssn + s * ccs - sn * c) + cs * s; + res = sn + cor; + cor = (sn - res) + cor; + *corp = cor; + return res; +} + /* A more precise variant of DO_SIN. EPS is the adjustment to the correction COR. */ static inline double @@ -337,13 +440,13 @@ do_sincos_1 (double a, double da, double x, int4 n, bool shift_quadrant) if (xx < 0.01588) { /* Taylor series. */ - res = TAYLOR_SIN (xx, a, da, cor); + res = taylor_sin_scalar (xx, a, da, &cor); cor = 1.02 * cor + __copysign (eps, cor); retval = (res == res + cor) ? res : sloww (a, da, x, shift_quadrant); } else { - res = do_sin (a, da, &cor); + res = do_sin_scalar (a, da, &cor); cor = 1.035 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? __copysign (res, a) : sloww1 (a, da, x, shift_quadrant)); @@ -352,7 +455,7 @@ do_sincos_1 (double a, double da, double x, int4 n, bool shift_quadrant) case 1: case 3: - res = do_cos (a, da, &cor); + res = do_cos_scalar (a, da, &cor); cor = 1.025 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? ((n & 2) ? -res : res) : sloww2 (a, da, x, n)); @@ -411,13 +514,13 @@ do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant) if (xx < 0.01588) { /* Taylor series. */ - res = TAYLOR_SIN (xx, a, da, cor); + res = taylor_sin_scalar (xx, a, da, &cor); cor = 1.02 * cor + __copysign (eps, cor); retval = (res == res + cor) ? res : bsloww (a, da, x, n); } else { - res = do_sin (a, da, &cor); + res = do_sin_scalar (a, da, &cor); cor = 1.035 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? __copysign (res, a) : bsloww1 (a, da, x, n)); @@ -426,7 +529,7 @@ do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant) case 1: case 3: - res = do_cos (a, da, &cor); + res = do_cos_scalar (a, da, &cor); cor = 1.025 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? ((n & 2) ? -res : res) : bsloww2 (a, da, x, n)); @@ -436,33 +539,84 @@ do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant) return retval; } +static double +__always_inline +do_sincos_tail (double x, int4 k, int4 kl, bool shift_quadrant) +{ + int4 n; + double retval; + if (k < 0x419921FB) + { + double a, da; + n = reduce_sincos_1 (x, &a, &da); + retval = do_sincos_1 (a, da, x, n, shift_quadrant); + } + else if (k < 0x42F00000) + { + double a, da; + n = reduce_sincos_2 (x, &a, &da); + retval = do_sincos_2 (a, da, x, n, shift_quadrant); + } + else if (k < 0x7ff00000) + { + retval = reduce_and_compute (x, shift_quadrant); + } + else + { + if (k == 0x7ff00000 && kl == 0) + __set_errno (EDOM); + retval = x / x; + } + return retval; +} + /*******************************************************************/ /* An ultimate sin routine. Given an IEEE double machine number x */ /* it computes the correctly rounded (to nearest) value of sin(x) */ /*******************************************************************/ #ifdef IN_SINCOS -static double +static OP_TYPE #else -double +OP_TYPE SECTION #endif -__sin (double x) +__sin (OP_TYPE x) { - double xx, res, t, cor; - mynumber u; - int4 k, m; - double retval = 0; + OP_TYPE xx, res, res2, t, cor; + OP_TYPE retval; #ifndef IN_SINCOS SET_RESTORE_ROUND_53BIT (FE_TONEAREST); #endif +#if VEC_SIZE == 1 + mynumber u; + int4 m, k; + OP_TYPE zero = 0.0; u.x = x; m = u.i[HIGH_HALF]; k = 0x7fffffff & m; /* no sign */ +#else /* VEC_SIZE == 2 */ + mynumber u0, u1; + int4 m0, m1, k0, k1, k; + OP_TYPE zero = {0.0, 0.0}; + u0.x = x[0]; + u1.x = x[1]; + m0 = u0.i[HIGH_HALF]; + m1 = u1.i[HIGH_HALF]; + k0 = 0x7fffffff & m0; + k1 = 0x7fffffff & m1; + k = k0 > k1 ? k0 : k1; +#endif + if (k < 0x3e500000) /* if x->0 =>sin(x)=x */ { +#if VEC_SIZE == 1 math_check_force_underflow (x); +#else + math_check_force_underflow (x[0]); + math_check_force_underflow (x[1]); +#endif retval = x; } /*---------------------------- 2^-26 < |x|< 0.25 ----------------------*/ @@ -473,54 +627,64 @@ __sin (double x) t = POLYNOMIAL (xx) * (xx * x); res = x + t; cor = (x - res) + t; - retval = (res == res + 1.07 * cor) ? res : slow (x); + res2 = res + 1.07 * cor; +#if VEC_SIZE == 1 + retval = (res == res2) ? res : slow (x); +#else + retval[0] = (res[0] == res2[0]) ? res[0] : slow (x[0]); + retval[1] = (res[1] == res2[1]) ? res[1] : slow (x[1]); +#endif } /* else if (k < 0x3fd00000) */ + /*---------------------------- 0.25<|x|< 0.855469---------------------- */ else if (k < 0x3feb6000) { - res = do_sin (x, 0, &cor); - retval = (res == res + 1.096 * cor) ? res : slow1 (x); + res = do_sin_vector (x, zero, &cor); + res2 = res + 1.096 * cor; +#if VEC_SIZE == 1 + retval = (res == res2) ? res : slow1 (x); retval = __copysign (retval, x); +#else + retval[0] = (res[0] == res2[0]) ? res[0] : slow1 (x[0]); + retval[1] = (res[1] == res2[1]) ? res[1] : slow1 (x[1]); + retval[0] = __copysign (retval[0], x[0]); + retval[1] = __copysign (retval[1], x[1]); +#endif } /* else if (k < 0x3feb6000) */ /*----------------------- 0.855469 <|x|<2.426265 ----------------------*/ else if (k < 0x400368fd) { - +#if VEC_SIZE == 1 t = hp0 - fabs (x); - res = do_cos (t, hp1, &cor); - retval = (res == res + 1.020 * cor) ? res : slow2 (x); + res = do_cos_vector (t, hp1, &cor); + res2 = res + 1.020 * cor; + retval = (res == res2) ? res : slow2 (x); retval = __copysign (retval, x); +#else + OP_TYPE hp1_vec; + t[0] = hp0 - fabs (x[0]); + t[1] = hp0 - fabs (x[1]); + hp1_vec[0] = hp1_vec[1] = hp1; + res = do_cos_vector (t, hp1_vec, &cor); + res2 = res + 1.020 * cor; + retval[0] = (res[0] == res2[0]) ? res[0] : slow2 (x[0]); + retval[1] = (res[1] == res2[1]) ? res[1] : slow2 (x[1]); + retval[0] = __copysign (retval[0], x[0]); + retval[1] = __copysign (retval[1], x[1]); +#endif } /* else if (k < 0x400368fd) */ #ifndef IN_SINCOS /*-------------------------- 2.426265<|x|< 105414350 ----------------------*/ - else if (k < 0x419921FB) - { - double a, da; - int4 n = reduce_sincos_1 (x, &a, &da); - retval = do_sincos_1 (a, da, x, n, false); - } /* else if (k < 0x419921FB ) */ - -/*---------------------105414350 <|x|< 281474976710656 --------------------*/ - else if (k < 0x42F00000) - { - double a, da; - - int4 n = reduce_sincos_2 (x, &a, &da); - retval = do_sincos_2 (a, da, x, n, false); - } /* else if (k < 0x42F00000 ) */ - -/* -----------------281474976710656 <|x| <2^1024----------------------------*/ - else if (k < 0x7ff00000) - retval = reduce_and_compute (x, false); - -/*--------------------- |x| > 2^1024 ----------------------------------*/ else { - if (k == 0x7ff00000 && u.i[LOW_HALF] == 0) - __set_errno (EDOM); - retval = x / x; +#if VEC_SIZE == 1 + retval = do_sincos_tail (x, k, u.i[LOW_HALF], false); +#else + retval[0] = do_sincos_tail (x[0], k0, u0.i[LOW_HALF], false); + retval[1] = do_sincos_tail (x[1], k1, u1.i[LOW_HALF], false); +#endif } #endif @@ -534,85 +698,122 @@ __sin (double x) /*******************************************************************/ #ifdef IN_SINCOS -static double +static OP_TYPE #else -double +OP_TYPE SECTION #endif -__cos (double x) +__cos (OP_TYPE x) { - double y, xx, res, cor, a, da; - mynumber u; - int4 k, m; - - double retval = 0; + OP_TYPE y, xx, res, res2, a, da, cor; + OP_TYPE retval; #ifndef IN_SINCOS SET_RESTORE_ROUND_53BIT (FE_TONEAREST); #endif +#if VEC_SIZE == 1 + mynumber u; + int4 m, k; + OP_TYPE zero = 0.0; + OP_TYPE one = 1.0; u.x = x; m = u.i[HIGH_HALF]; - k = 0x7fffffff & m; + k = 0x7fffffff & m; /* no sign */ +#else /* VEC_SIZE == 2 */ + mynumber u0, u1; + int4 m0, m1, k0, k1, k; + OP_TYPE zero = {0.0, 0.0}; + OP_TYPE one = {1.0, 1.0}; + u0.x = x[0]; + u1.x = x[1]; + m0 = u0.i[HIGH_HALF]; + m1 = u1.i[HIGH_HALF]; + k0 = 0x7fffffff & m0; + k1 = 0x7fffffff & m1; + k = k0 > k1 ? k0 : k1; +#endif /* |x|<2^-27 => cos(x)=1 */ if (k < 0x3e400000) - retval = 1.0; + retval = one; else if (k < 0x3feb6000) { /* 2^-27 < |x| < 0.855469 */ - res = do_cos (x, 0, &cor); - retval = (res == res + 1.020 * cor) ? res : cslow2 (x); + res = do_cos_vector(x, zero, &cor); + res2 = res + 1.020 * cor; +#if VEC_SIZE == 1 + retval = (res == res2) ? res : cslow2 (x); +#else + retval[0] = (res[0] == res2[0]) ? res[0] : cslow2 (x[0]); + retval[1] = (res[1] == res2[1]) ? res[1] : cslow2 (x[1]); +#endif } /* else if (k < 0x3feb6000) */ else if (k < 0x400368fd) { /* 0.855469 <|x|<2.426265 */ ; +#if VEC_SIZE == 1 y = hp0 - fabs (x); a = y + hp1; da = (y - a) + hp1; xx = a * a; if (xx < 0.01588) { - res = TAYLOR_SIN (xx, a, da, cor); + res = taylor_sin_scalar (xx, a, da, &cor); cor = 1.02 * cor + __copysign (1.0e-31, cor); retval = (res == res + cor) ? res : sloww (a, da, x, true); } else { - res = do_sin (a, da, &cor); + res = do_sin_scalar (a, da, &cor); cor = 1.035 * cor + __copysign (1.0e-31, cor); retval = ((res == res + cor) ? __copysign (res, a) : sloww1 (a, da, x, true)); } - +#else + OP_TYPE hp1_vec, cors; + hp1_vec[0] = hp1_vec[1] = hp1; + y[0] = hp0 - fabs (x[0]); + y[1] = hp0 - fabs (x[1]); + a = y + hp1_vec; + da = (y - a) + hp1_vec; + xx = a * a; + if (xx[0] < 0.01588 && x[1] < 0.01588) + { + res = taylor_sin_vector (xx, a, da, &cor); + cors[0] = __copysign (1.0e-31, cor[0]); + cors[1] = __copysign (1.0e-31, cor[1]); + cor = 1.02 * cor + cors; + res2 = res + cor; + retval[0] = (res[0] == res2[0]) ? res[0] + : sloww (a[0], da[0], x[0], true); + retval[1] = (res[1] == res2[1]) ? res[1] + : sloww (a[1], da[1], x[1], true); + } + else + { + res = do_sin_vector (a, da, &cor); + cors[0] = __copysign (1.0e-31, cor[0]); + cors[1] = __copysign (1.0e-31, cor[1]); + cor = 1.035 * cor + cors; + res2 = res + cor; + retval[0] = __copysign (res[0], a[0]); + retval[1] = __copysign (res[1], a[1]); + retval[0] = (res[0] == res2[0]) ? retval[0] : sloww1 (a[0], da[0], x[0], true); + retval[1] = (res[1] == res2[1]) ? retval[1] : sloww1 (a[1], da[1], x[1], true); + } +#endif } /* else if (k < 0x400368fd) */ - #ifndef IN_SINCOS - else if (k < 0x419921FB) - { /* 2.426265<|x|< 105414350 */ - double a, da; - int4 n = reduce_sincos_1 (x, &a, &da); - retval = do_sincos_1 (a, da, x, n, true); - } /* else if (k < 0x419921FB ) */ - - else if (k < 0x42F00000) - { - double a, da; - - int4 n = reduce_sincos_2 (x, &a, &da); - retval = do_sincos_2 (a, da, x, n, true); - } /* else if (k < 0x42F00000 ) */ - - /* 281474976710656 <|x| <2^1024 */ - else if (k < 0x7ff00000) - retval = reduce_and_compute (x, true); - else { - if (k == 0x7ff00000 && u.i[LOW_HALF] == 0) - __set_errno (EDOM); - retval = x / x; /* |x| > 2^1024 */ +#if VEC_SIZE == 1 + retval = do_sincos_tail (x, k, u.i[LOW_HALF], true); +#else + retval[0] = do_sincos_tail (x[0], k0, u0.i[LOW_HALF], true); + retval[1] = do_sincos_tail (x[1], k1, u1.i[LOW_HALF], true); +#endif } #endif