From patchwork Fri Aug 4 23:15:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Steve Ellcey X-Patchwork-Id: 21928 Received: (qmail 43097 invoked by alias); 4 Aug 2017 23:15:45 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 43082 invoked by uid 89); 4 Aug 2017 23:15:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: NAM03-BY2-obe.outbound.protection.outlook.com Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Steve.Ellcey@cavium.com; Message-ID: <1501888532.3962.92.camel@cavium.com> Subject: Re: [PATCH 3/4] Add ILP32 support to aarch64 From: Steve Ellcey Reply-To: sellcey@cavium.com To: Joseph Myers , Wilco Dijkstra Cc: "Ellcey, Steve" , nd , "libc-alpha@sourceware.org" Date: Fri, 04 Aug 2017 16:15:32 -0700 In-Reply-To: References: Mime-Version: 1.0 X-ClientProxiedBy: MWHPR13CA0045.namprd13.prod.outlook.com (10.173.117.159) To BN6PR07MB3460.namprd07.prod.outlook.com (10.161.153.23) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1f0c6d57-979c-41eb-8b33-08d4db8eb777 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(49563074)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BN6PR07MB3460; X-Microsoft-Exchange-Diagnostics: 1; BN6PR07MB3460; 3:PXO4Y8Wb1JnKWmxx/zRPuQo92/v6RWX8gH2hbL2jQ9xQhFNb86cGI6IQysAJCTT8WgU6w1LunxXUa41tGBIGo3xqH2zIIQb4wRE94lrozAw2bR7xgH9VCIrqyTdMPxbwCKqMDMaG9xl9LUtL/+dA7FCbjdMCUAuvSqcIqQpNZQkUfeo8Bi1HiNNShKmoRavVKcDXgV5jTfLrAaV6oVwqBBfwek1+wpNEAkrOCnx17ZxtaZvF4dQ8IBu8GZTZOIzs; 25:NaPHLUDmsyNka7aOuQjGVxJVJgsQu1rMcB0VPCEFW+LVO1N2Dc+TSjDaiQiuGF2B7mPGDu6xKL9uWx9zVTc1Q3fZZeTbOed7DzzWVfKHUPGNOM2yTOrZBPOF3vehuHKaJ9oduCQBhyNzDrcKfvoUK40SCllZyA2ibZWNtR9NiRAHZCgJVYAWBP7FRsWf5RNp0p9gKscvtVJiqdNQbiNHIiCYK09RnKINZn0RofEJEy+aIWoIgbt9xfo/ntRRU9HbWQTqWdPkTgz+jF0if4j8SAa7GXfcQNITzy+Jse3cnFoUJ1064CxmcfuawciIV/CKVmtIWcvsWIrQXOsujp3gJg==; 31:vg5bkoDifHYWHurM+ek0YEO29Ob2y9GLvZcPVMuxSNKlBORgynYGMR29FFagg2l6VlTt2fyAnqEqs21Lw2W8b0uFWVrKhKoWtNT42le+7PykAsYwl80KBoOAK2RPhXpBGAqplRj99qMTzgcRGdO6qq4Rpchhs9Dvvdn47oPMEC0OyRbPz2WBemv2wdWhr6DU85ISxEQrHwqsfhW+5v/Ym9Xni2qEg4DA9GnQjCPSsZs= X-MS-TrafficTypeDiagnostic: BN6PR07MB3460: X-Microsoft-Exchange-Diagnostics: 1; BN6PR07MB3460; 20:o4jwh4pN2Pe8NtgKl8ShjS3BO6TubfZxVf3VtLJfa61egb7tolkuF3jTU7Gg+zkchQVwHgrtT6uKZnuVwNCty7UK+FWIzAIbNbX8lzYhOQA6avXY+zra1Ra42bdGkj5lt/yj9wQhWAj0f9Iyr4lUbag/5A4hBk4BnSQLoJOpM9nIYnslJLNPr0/RWvPFR5Vba6uG5yusCh9L+vuzAFHUoEZXGR3PJdiouF5gFk/7oWf4s9EUgWXmhfsTBUIMiP2BfAPZpTxDXonC2o6+rAN02qGXKl6OF+eR5EPWBJVLctphzXVvecx6jff26Jmg9BmmY9vCsYHnthJrDqETTNnkyCJOnkUGQNaDg1zPLXPLwvVaI+VH/xD6HcePRQd74gBSLV746IMBZAocA23ZCw+1i6Cty8A5PKc5aBkY/F61rM8cnPZ1NB0lfoOvdtc1/AnuVgtd2CHm+c0QG711kafP4fIVAGYWfEb0venjlmLu2SVXUgao4EZLUN4klV7imSLf; 4:j1Iyr5M7A85QwYXiXokWyAhDuIzBGTr79J0/8HtRA/kbiRrkLRIxoF9SvGJrrUC1ZxBMcAT+1T8C8QSaZT613INPi8/vOxK6CgZI7OtM6DzC5RqqYstTUCocz6hSAeTqzks9QldzIrf4nIM7b1QaEsSpkurFZQhjphcHxpjAvJ1uCMTUqNyYkCXgi+97MHKPr2KC62JzZGvlO+cgI53qXQ7xkwtTSMp6js0Qe7h+6Tv++PxQPrCSkuzP3UmZPnfL X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3002001)(100000703101)(100105400095)(6041248)(20161123562025)(20161123558100)(20161123564025)(20161123560025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BN6PR07MB3460; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BN6PR07MB3460; X-Forefront-PRVS: 0389EDA07F X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(7370300001)(6009001)(39840400002)(39850400002)(39410400002)(39400400002)(39450400003)(24454002)(189002)(199003)(377424004)(229853002)(478600001)(25786009)(5660300001)(72206003)(6666003)(7736002)(106356001)(105586002)(3450700001)(2950100002)(42186005)(4610100001)(97736004)(43066003)(189998001)(50226002)(6486002)(81166006)(6506006)(512874002)(5890100001)(305945005)(50986999)(81156014)(4326008)(76176999)(8676002)(2906002)(66066001)(68736007)(101416001)(53416004)(84326002)(2476003)(69596002)(33646002)(6116002)(568964002)(103116003)(36756003)(6246003)(3846002)(54906002)(38730400002)(53936002)(7350300001)(6512007)(99106002); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR07MB3460; H:sellcey-dt.caveonetworks.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN6PR07MB3460; 23:8MOwM4DygMPKEsNQ1xa82wIghB+61dFbiFiLWPil2?= =?us-ascii?Q?VIxBTsnW55AyfgkvL01xozt2dUCKiSAC+LXQYQDfp7gVxTmVWpYw/RqvAFVy?= =?us-ascii?Q?JEqV99xDnzHR8fZtz4ubAgpuoAjBRCLW2yiR5SzGsjh/c9EQRGPYLLxc9yh+?= =?us-ascii?Q?AWFJIISMVLV8rFtDeZ6GjXM74Wd5HPn8+lhxxbMvFPqydga/v5PWnVtZoX0y?= =?us-ascii?Q?E3lwotKU1X4uKsWhVkkCpDTAoc7+/Yte5AeOy+4grR0AswRDx+773QoneSAk?= =?us-ascii?Q?KCQPuRxmYHrditoG5ILEXyr6JEP+33+5FlLNWtIamBiqZm8bDv0sPpZDKZnB?= =?us-ascii?Q?FK4rLrG84LeKtDVF2XtH0+Tiy9QV2H+64+LdjWk4klrbokqQWl5dRHTlf8Uj?= =?us-ascii?Q?XMclGGnwWwy2itFi/2qr4eKTmnTzYPNxiyy4VYhtDe4eHQlBsHhc6VJzTt1L?= =?us-ascii?Q?CALohIjiPnggyb513t4Yc1lkOO4W+WyWbWuEs+4Go3S2ROyCtd2EjIOyV99N?= =?us-ascii?Q?2j/JKC+J19TblJywXhinETlLHX37bojOgAR86H2t7fz75xDmq9PypiHC+bTz?= =?us-ascii?Q?6hpe4a59WOI4OlWswvp/qRyKFOhwBXhENCy/tOjaUbst8c9xWCFiZIi4Tzib?= =?us-ascii?Q?3/uUke4yBTrE7fqRp2asTSXIhTV2xJyZrC49ZVs3rfLFhEMdbhS45POieWlX?= =?us-ascii?Q?7IamFSVQgZhdG0mrM0qQO48dLQ1mkft31pmYg3ok8WOqXfbsuI6MkhXcwfna?= =?us-ascii?Q?wVz0n4/yeBuJm53/u8Q9BSNxsFLOutCzqATNuBg2AqJ/abTe2raUq7p7Qqel?= =?us-ascii?Q?mz0bGg7Ohat+FflyGG9rGcZvyFgElbfpIgEgKbShY88hTthnQqKJgxIXsy8u?= =?us-ascii?Q?DBPurA3JYn7l75OPkvgkHth0S7M52AwokhdKC8jAnnmETPA+amjlowTrkHEz?= =?us-ascii?Q?fci61tC6R/jPo1EiOkse63qRjQfVOjIZlXRu90jFVrUlSuhKNnNPlHaYs0pR?= =?us-ascii?Q?9Jl3LTFIk+30EtuADn+93LnAySD1BKBNqjlps8o+eNCSTL1eS/URm1ta/b0p?= =?us-ascii?Q?rL7YKSxpFS3N8mURHmrZuawWyszjlPDhu92Y1dzSOSwNZ0zeKQEJgd4/6waq?= =?us-ascii?Q?mXBR8tbwpL3EwQ4bi4i0r5Tu0eGi8111a9boTUTpiiupl/Prr391+4mohiPN?= =?us-ascii?Q?GHlLsLESw+n/KE4uCK6VphISRegArzpmx2KlaSH7Ha+G4oh4d/E19qt/YoqE?= =?us-ascii?Q?zmhzX24Vxr+1CB7Z5xrWfOsuM5aiFJyds61RSxmm1TRTsyCQIDSNI6ywftRU?= =?us-ascii?Q?nBSd2xh6STQ8g9BuVXkuQk5+7Sjghk8eo5DFhr9bsG1NxcaI4GXDruNsCyqb?= =?us-ascii?Q?sXbcQ3CWwd2p3pK4rOFNHKaFaCEYRrryQxU0kkJMIGQAPPhP+xW7TB3v2Kr7?= =?us-ascii?Q?vGdHz/8AtVcvV2oPyOWe4IxEquRp8ATry/0A+ncU0HPxtpjNfg3kA5u4cF+V?= =?us-ascii?Q?0PoRokCI37tIw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BN6PR07MB3460; 6:kZ4katjtwupXuBmZTSPH5hANH3xQq/68r2fag0HXBNez5dSCFiKAkpMZkUAGNVXfcUlz1fo42ttQKwOoupuBA0uMMv5MCWjNMv8Uvax6k5uI+nSKcXlFWu+lypzOc0rkZrKZiQJ4FfU5h9fxH8Uu++jroN76OrNtigCdNK0PsoKt3p9cW3V0Y0HfxxMWEqKSd4urkVkhNLCqud2Z0YzL6n4epnLUm1pn2hyovVvhF/NCK09tHc7kMqnTnPCz1XvEnEfWXY9hQHcpOmraCFQueGnOoiBH1ehcIGCVlk2PDTXesEkbsShqZEg9tcNN9jkGbXRdqRZ5/qdokugqdIbXwQ==; 5:jePSja1DsbHS3wa6JoX2eP0jK0gbXDlj3IX3rulU7KS7Ku51SxcalLQO0vLVKOAs0addhx89AJgoJc+YfZvhy2f/7oW8wIRqUd3weJtJZdyxxNDHsEU+tH1j4sxLdEkb9GzRBpNL0KvGGkfpyugmdw==; 24:rnPWRVKNo4qA8xDPprcSPgDMJ2bGjG6l5mtFLhmKzopt6WWJzve17+JjSuSkPuXA8iGXDLVSOBz4er/a3hNTkrglgMMmddqRED+W3TQTNI0=; 7:yGknYB+fN3K1FweOjxDP0xV0ylbFjvjc3bDpFihUdvaf0yYeW7DuFpsOWvEm3RPrREXTrcXw0hLcTa7oERAzuJeAv2t3O82QM99/5Bc9pMoI2+tb7IgxdseP/DtVvT7jj5nQAnievtmmk3lpqLBKvu68k4Yrf6L51xJXfJCf8v7g4BQFqAWWVSve0ZjH59rrmFpUpufsD+8+FqugjSZw0/wgmUhy7ceJcr77Ia8FfiE= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Aug 2017 23:15:36.8642 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR07MB3460 On Fri, 2017-08-04 at 00:12 +0000, Joseph Myers wrote: > On Thu, 3 Aug 2017, Wilco Dijkstra wrote: > > > The generic implementation may well be faster... I'm not sure where the > > requirement of not raising inexact comes from (I don't see it in the definition > > of lrint, and we generally don't care since inexact is set by almost every FP > > calculation), but if it is absolutely required you'd special case values larger > > than LONG_MAX. > The requirement comes from lrint being bound to IEEE 754 conversion  > operations, so only raising inexact under the conditions specified and no  > spurious inexact. Here is a new version of this patch.  It (mostly) avoids fenv calls when not needed and preserves any exceptions that may be set on entry to the function. Steve Ellcey sellcey@cavium.com 2017-08-04  Steve Ellcey   * sysdeps/aarch64/fpu/s_llrint.c (OREG_SIZE): New macro. * sysdeps/aarch64/fpu/s_llround.c (OREG_SIZE): Likewise. * sysdeps/aarch64/fpu/s_llrintf.c (OREGS, IREGS): Remove. (IREG_SIZE, OREG_SIZE): New macros. * sysdeps/aarch64/fpu/s_llroundf.c: (OREGS, IREGS): Remove. (IREG_SIZE, OREG_SIZE): New macros. * sysdeps/aarch64/fpu/s_lrintf.c (IREGS): Remove. (IREG_SIZE): New macro. * sysdeps/aarch64/fpu/s_lroundf.c (IREGS): Remove. (IREG_SIZE): New macro. * sysdeps/aarch64/fpu/s_lrint.c (math_private.h, fenv.h, stdint.h): New includes. (IREG_SIZE, OREG_SIZE): Initialize if not already set. (OREGS, IREGS): Set based on IREG_SIZE and OREG_SIZE. (__CONCATX): Handle exceptions correctly on large values that may set FE_INVALID. * sysdeps/aarch64/fpu/s_lround.c (IREG_SIZE, OREG_SIZE): Initialize if not already set.         (OREGS, IREGS): Set based on IREG_SIZE and OREG_SIZE. diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c index c0d0d0e..57821c0 100644 --- a/sysdeps/aarch64/fpu/s_llrint.c +++ b/sysdeps/aarch64/fpu/s_llrint.c @@ -18,4 +18,5 @@ #define FUNC llrint #define OTYPE long long int +#define OREG_SIZE 64 #include diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c index 67724c6..98ed4f8 100644 --- a/sysdeps/aarch64/fpu/s_llrintf.c +++ b/sysdeps/aarch64/fpu/s_llrintf.c @@ -18,6 +18,7 @@ #define FUNC llrintf #define ITYPE float -#define IREGS "s" +#define IREG_SIZE 32 #define OTYPE long long int +#define OREG_SIZE 64 #include diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c index ed4b192..ef7aedf 100644 --- a/sysdeps/aarch64/fpu/s_llround.c +++ b/sysdeps/aarch64/fpu/s_llround.c @@ -18,4 +18,5 @@ #define FUNC llround #define OTYPE long long int +#define OREG_SIZE 64 #include diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c index 360ce8b..294f0f4 100644 --- a/sysdeps/aarch64/fpu/s_llroundf.c +++ b/sysdeps/aarch64/fpu/s_llroundf.c @@ -18,6 +18,7 @@ #define FUNC llroundf #define ITYPE float -#define IREGS "s" +#define IREG_SIZE 32 #define OTYPE long long int +#define OREG_SIZE 64 #include diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c index 8c61a03..19f9b5b 100644 --- a/sysdeps/aarch64/fpu/s_lrint.c +++ b/sysdeps/aarch64/fpu/s_lrint.c @@ -16,7 +16,10 @@ License along with the GNU C Library; if not, see . */ +#include #include +#include +#include #ifndef FUNC # define FUNC lrint @@ -24,18 +27,37 @@ #ifndef ITYPE # define ITYPE double -# define IREGS "d" +# define IREG_SIZE 64 #else -# ifndef IREGS -# error IREGS not defined +# ifndef IREG_SIZE +# error IREG_SIZE not defined # endif #endif #ifndef OTYPE # define OTYPE long int +# ifdef __ILP32__ +# define OREG_SIZE 32 +# else +# define OREG_SIZE 64 +# endif +#else +# ifndef OREG_SIZE +# error OREG_SIZE not defined +# endif +#endif + +#if IREG_SIZE == 32 +# define IREGS "s" +#else +# define IREGS "d" #endif -#define OREGS "x" +#if OREG_SIZE == 32 +# define OREGS "w" +#else +# define OREGS "x" +#endif #define __CONCATX(a,b) __CONCAT(a,b) @@ -44,6 +66,33 @@ __CONCATX(__,FUNC) (ITYPE x) { OTYPE result; ITYPE temp; + +#if IREG_SIZE == 64 && OREG_SIZE == 32 + if (__builtin_fabs (x) > INT32_MAX - 2) + { + /* Converting large values to a 32 bit in may cause the frintx/fcvtza + sequence to set both FE_INVALID and FE_INEXACT. To avoid this + we save and restore the FE and only set one or the other. */ + + fenv_t env; + bool invalid_p, inexact_p; + + libc_feholdexcept (&env); + asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t" + "fcvtzs" "\t%" OREGS "0, %" IREGS "1" + : "=r" (result), "=w" (temp) : "w" (x) ); + invalid_p = libc_fetestexcept (FE_INVALID); + inexact_p = libc_fetestexcept (FE_INEXACT); + libc_fesetenv (&env); + + if (invalid_p) + feraiseexcept (FE_INVALID); + else if (inexact_p) + feraiseexcept (FE_INEXACT); + + return result; + } +#endif asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t" "fcvtzs" "\t%" OREGS "0, %" IREGS "1" : "=r" (result), "=w" (temp) : "w" (x) ); diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c index a995e4b..2e73271 100644 --- a/sysdeps/aarch64/fpu/s_lrintf.c +++ b/sysdeps/aarch64/fpu/s_lrintf.c @@ -18,5 +18,5 @@ #define FUNC lrintf #define ITYPE float -#define IREGS "s" +#define IREG_SIZE 32 #include diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c index 9be9e7f..1f77d82 100644 --- a/sysdeps/aarch64/fpu/s_lround.c +++ b/sysdeps/aarch64/fpu/s_lround.c @@ -24,18 +24,37 @@ #ifndef ITYPE # define ITYPE double -# define IREGS "d" +# define IREG_SIZE 64 #else -# ifndef IREGS -# error IREGS not defined +# ifndef IREG_SIZE +# error IREG_SIZE not defined # endif #endif #ifndef OTYPE # define OTYPE long int +# ifdef __ILP32__ +# define OREG_SIZE 32 +# else +# define OREG_SIZE 64 +# endif +#else +# ifndef OREG_SIZE +# error OREG_SIZE not defined +# endif +#endif + +#if IREG_SIZE == 32 +# define IREGS "s" +#else +# define IREGS "d" #endif -#define OREGS "x" +#if OREG_SIZE == 32 +# define OREGS "w" +#else +# define OREGS "x" +#endif #define __CONCATX(a,b) __CONCAT(a,b) diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c index 4a066d4..b30ddb6 100644 --- a/sysdeps/aarch64/fpu/s_lroundf.c +++ b/sysdeps/aarch64/fpu/s_lroundf.c @@ -18,5 +18,5 @@ #define FUNC lroundf #define ITYPE float -#define IREGS "s" +#define IREG_SIZE 32 #include