[2/5,v3] New generic powf

Message ID	59CE28DE.8070003@arm.com
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Message-ID: <59CE28DE.8070003@arm.com> Date: Fri, 29 Sep 2017 12:05:02 +0100 From: Szabolcs Nagy <szabolcs.nagy@arm.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: GNU C Library <libc-alpha@sourceware.org> CC: nd@arm.com Subject: [PATCH 2/5 v3] New generic powf References: <59CE27D3.4050205@arm.com> In-Reply-To: <59CE27D3.4050205@arm.com> Content-Type: multipart/mixed; boundary="------------040506070601050908010900" NoDisclaimer: True X-Microsoft-Exchange-Diagnostics: 1; HE1PR0802MB2489; 20:TOX9PcNMIcOQvgcrgy9tO0TSitaM/ISqovx0tBvuhDy8YY4U2Kcs+OOsPrRXoWakfNHRAYr/5h5TgZ4jtBFnnaN0YuyY5009qUWvfEGj/AMt/t8J9iknujFtfVNn79T3JJ5b1vDiCJuS4gGKJ+vw6COoCRXEh35alOA0et0w51U=; 4:yg6BGRxquLlNfPKihewVyX5hHG8Ao2Y+U7e33hDsbt3hEokXDYHn+sAIT4ZF1MAMjM1H8DjbnVo+DEXakJywQf/Ou7WYggbEHwrIbMgvDmRZZf4CYlJWK52dYzhNM3+LpNczNPlyvBN/v8Fr91+NBWbS4nBSzrc0RRwzhleYyuVJNmIzwnndOqFWbnj+LXLalKvsoEBdykJ6UlHxKYSnxuqDFSZDidIEzQfHm0h3LyEQSH3Ix8jGaCwWlQAovNmV Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; HE1PR0802MB2489; 23:XepNcMpHEdthsjy3vEf0RhHFV2tWvzu1AmkMfU4?= =?us-ascii?Q?ABXERP9SWRrfxsOh+60ZtkZGKjiLjhH6RZhgHZF0IeNcgOtScryk1aF/Ht36?= =?us-ascii?Q?K+p75M0VQH2aBEQ56KlqLqLdJa47rPNWqr3hGnUXWTopFjGsjj08fKrnIa/U?= =?us-ascii?Q?8XLir8VO5vqI5nEUSw+AyHXU3Zc1YCkSE606tw/Eaa1Ag+T7hbf1Qb2y0VdH?= =?us-ascii?Q?JJy/s59RQKYLq68G2L22UC83RSlQir1cUYaQMlQVW/mKVcAKwVvo6ar4hS82?= =?us-ascii?Q?40omM3p7bMkeA4MI3nreTiSUQgvJrHsiwozbpQ8ivKVk8S0/t0amZmOJOXHa?= =?us-ascii?Q?v7ieOU0KyUDlyxh2p1k81ClQkZruIP0+TySz4Sg1GlWjxu2N2DfXcYLwHLtN?= =?us-ascii?Q?iKqQ3jeLrK1kC1RdTBhBFf8z9qM3fgiJy9g9fde+EO4oirCQeDOw8Qp7gFvr?= =?us-ascii?Q?HFs5MqY9SajuaP1Ky6YNYvRcBFr0bAvs1EzkYLvU+GHMqpG7+WqqF1oZDroU?= =?us-ascii?Q?/rpNhnpCT2jQTXdAaVNDpfMIsTj7J4r2EyoZcHf0xMlVaPwlLN1N9Nwxt9Tv?= =?us-ascii?Q?odLW3XK3WcpqqOE+9XvJSR99MUiQoay5YUki1NmoYhWTom6O0pt89Omiil9k?= =?us-ascii?Q?t7Fg57nbLRKleP/9QwjTZAS+Nduu+iNWKLmXwo/JgOaeTvFiXb3npb1tg99G?= =?us-ascii?Q?rRvztx5Evv86dvpJyyUMppLtTn1b7GCewU/VejxdKoLa8In9SU6RpEDbT1XH?= =?us-ascii?Q?Jwp2rckeYC3P+295W1PWyx40HMFZ94sC40EQlV+zTd6UGFWmEGF9AAg87Wrl?= =?us-ascii?Q?Rjte5dBHHAX5vwfChwAtj9E8StcCBgKb0oTbk7J4vQn94HHAeFkqhjKlDX8r?= =?us-ascii?Q?GIYs05DT0R+GBNxxo+9ks6hFgBJjyJJ/ENzrG6GXndm1/c4L6T1FN6avaA9s?= =?us-ascii?Q?SRJhyLGAB4vUXUkyRb2AHg/bQ+hih1GG5wz2aD9gb1TLRAbYcRaJQURBJxDS?= =?us-ascii?Q?0UOItq7ekK/IwCLh4XgpEQhiomvaxpB8wnDgfCUkAP5Ezs0ClZEKtvMfpB7b?= =?us-ascii?Q?GlctpLYIsig29SGpmirbLmCC1wBGi5oIyGJSlI4BzfdsS05j8SJsRcpZr/qM?= =?us-ascii?Q?YrniTXyLXNEZ8+ZtooxjMWZpoGFBKtKeyGZSa+jCRAq0P2wogqYF+ml4I1aY?= =?us-ascii?Q?EJuI8Bpq5hGMjC3008dLHUydQjc6J/utsfFYSMvY/JcNMod7OEAeV28yNDaV?= =?us-ascii?Q?epF4413El7XVzr1y7JDWaaydUR8X1DFr1qUTUXTbe1d2fi6cwVt9ToRIrCEI?= =?us-ascii?Q?BiRX0CXrufRIJJPIXwXCJdoiTAGI6Pd3ISI1OfefZJPYF9LBYZWp4EKsZk6b?= =?us-ascii?Q?sDilCGC7ZdslrXRF5DZhNq99BUVg=3D?= X-Microsoft-Exchange-Diagnostics: 1; HE1PR0802MB2489; 6:T43RwPg4z30HQ7irScJG7Jb/jxiwEgmR957+mNUHrF2XH9x8d2igSJU+Z39wvXKv2uIs/pFaGQDp27dvZlFcZqRVjLWxNZgFEf0Zo/vOOmw7wu9iP41AveT5xuZk8i5qvM940TUoV1BsDtpgYAVuy+nSh1VGKmR6E599bzzGUvgLsoWz7/wKeJSZI37h7hcyBLrkKsPw1f/MULNFqNxLium1ZI8gKYcVB84aNHolk7ntPwzQhl90VbA+pzJLyLdDM7VmLmx9ANBtQKf+8ADzyCjTEl7UkwXn4GgiMlLZwzABpoXfr6J3J3qRRstLz5EDozz1Q8ewJusltpn2yqd5ag==; 5:VWaYpofb3q1G9IkYgaVZV3v3FrWubftV2lE0zEhImF13olxF/Cg/DliMNHSpx7j7WQ6nTyAuWEMssjcRfeKBNRNDGMfzG+sxBdj2w1He4tyg2DWE4EVOvzyDZgab+gwxJCVrG5axurasc2XaD9cLSTKo24emwIKO3Tj+cVo6c+U=; 24:AsEGf7qdXLhYtIyaK1isFq+ZMluni7FlaITWmVVBRYH3IQJ3AmyTDD3c6Ok5/J13DgdGe0uYX2I9lWfMuYYMuKSZirIdkipEZqa9hn1zCTk=; 7:nwgjuqcTVw5+dOGkLwSAbyRuewviqqxd7zkzy5dwbqFJ08IKWFkPccMgxXrzteXsiavOfQy9PbKoc5j4/TxaYMhleYiokpiGy+M/WCGnzaaxmFl1RS9t0lTX1zXTK9v7FHopsYXskmwVGQQNg6dNFuYY3uMNwtY9Pf6EDZfFeem+TZmYx3sDln7K36qa6W/0XvqsXGW94aPd5jPTEJ+mWQX70/xhnV9cSGn0kxlLBtY= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM

Message ID

59CE28DE.8070003@arm.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Message-ID: <59CE28DE.8070003@arm.com>
Date: Fri, 29 Sep 2017 12:05:02 +0100
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.8.0
MIME-Version: 1.0
To: GNU C Library <libc-alpha@sourceware.org>
CC: nd@arm.com
Subject: [PATCH 2/5 v3] New generic powf
References: <59CE27D3.4050205@arm.com>
In-Reply-To: <59CE27D3.4050205@arm.com>
Content-Type: multipart/mixed;
	boundary="------------040506070601050908010900"
NoDisclaimer: True
Received-SPF: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Sep 2017 11:05:07.5882
	(UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2489

Commit Message

Szabolcs Nagy Sept. 29, 2017, 11:05 a.m. UTC

  v3:
- NEWS entry.

Comments

Joseph Myers Sept. 29, 2017, 4:22 p.m. UTC | #1

On Fri, 29 Sep 2017, Szabolcs Nagy wrote:

> +#if TOINT_INTRINSICS
> +#define POWF_SCALE_BITS EXP2F_TABLE_BITS
> +#else
> +#define POWF_SCALE_BITS 0
> +#endif

Missing preprocessor indentation inside #if ("# define").  OK with that 
fixed, provided you've done execution testing of this code on 
architectures both with and without TOINT_INTRINSICS used.

diff mbox

Patch

From e260a9f8a9279c231405593c449e1f5bd39b3fd1 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Mon, 4 Sep 2017 17:55:33 +0100
Subject: [PATCH 2/5] New generic powf

without wrapper on aarch64:
powf reciprocal-throughput: 4.2x faster
powf latency: 2.6x faster
old worst-case error: 1.11 ulp
new worst-case error: 0.82 ulp
aarch64 .text size: -780 bytes
aarch64 .rodata size: +144 bytes

powf(x,y) is implemented as exp2(y*log2(x)) with the same algorithms
that are used in exp2f and log2f, except that the log2f polynomial is
larger for extra precision and its output (and exp2f input) may be
scaled by a power of 2 (POWF_SCALE) to simplify the argument reduction
step of exp2 (possible when efficient round and convert toint operation
is available).

The special case handling tries to minimize the checks in the hot path.
When the input of exp2_inline is checked, int arithmetics is used as
that was faster on the tested aarch64 cores.

2017-09-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>

	* math/Makefile (type-float-routines): Add e_powf_log2_data.
	* sysdeps/ieee754/flt-32/e_powf.c: New implementation.
	* sysdeps/ieee754/flt-32/e_powf_log2_data.c: New file.
	* sysdeps/ieee754/flt-32/math_config.h (__powf_data): Define.
	(issignalingf_inline): Likewise.
	(POWF_LOG2_TABLE_BITS): Likewise.
	(POWF_LOG2_POLY_ORDER): Likewise.
	(POWF_SCALE_BITS): Likewise.
	(POWF_SCALE): Likewise.
	* sysdeps/i386/fpu/e_powf_log2_data.c: New file.
	* sysdeps/ia64/fpu/e_powf_log2_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c: New file.
---
 NEWS                                       |   2 +-
 math/Makefile                              |   2 +-
 sysdeps/i386/fpu/e_powf_log2_data.c        |   1 +
 sysdeps/ia64/fpu/e_powf_log2_data.c        |   1 +
 sysdeps/ieee754/flt-32/e_powf.c            | 388 ++++++++++++++---------------
 sysdeps/ieee754/flt-32/e_powf_log2_data.c  |  45 ++++
 sysdeps/ieee754/flt-32/math_config.h       |  27 ++
 sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c |   1 +
 8 files changed, 266 insertions(+), 201 deletions(-)
 create mode 100644 sysdeps/i386/fpu/e_powf_log2_data.c
 create mode 100644 sysdeps/ia64/fpu/e_powf_log2_data.c
 create mode 100644 sysdeps/ieee754/flt-32/e_powf_log2_data.c
 create mode 100644 sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c

diff --git a/NEWS b/NEWS
index 5e88c54a6b..f5821411ca 100644
--- a/NEWS
+++ b/NEWS
@@ -14,7 +14,7 @@  Major new features:
 
 * Optimized x86-64 trunc and truncf for processors with SSE4.1.
 
-* Optimized generic expf, exp2f, logf, log2f.
+* Optimized generic expf, exp2f, logf, log2f and powf.
 
 * In order to support faster and safer process termination the malloc API
   family of functions will no longer print a failure address and stack
diff --git a/math/Makefile b/math/Makefile
index b4b3101592..6c8aa3e413 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -116,7 +116,7 @@  type-double-routines := branred doasin dosincos halfulp mpa mpatan2	\
 # float support
 type-float-suffix := f
 type-float-routines := k_rem_pio2f math_errf e_exp2f_data e_logf_data	\
-		       e_log2f_data
+		       e_log2f_data e_powf_log2_data
 
 # _Float128 support
 type-float128-suffix := f128
diff --git a/sysdeps/i386/fpu/e_powf_log2_data.c b/sysdeps/i386/fpu/e_powf_log2_data.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/i386/fpu/e_powf_log2_data.c
@@ -0,0 +1 @@ 
+/* Not needed.  */
diff --git a/sysdeps/ia64/fpu/e_powf_log2_data.c b/sysdeps/ia64/fpu/e_powf_log2_data.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/ia64/fpu/e_powf_log2_data.c
@@ -0,0 +1 @@ 
+/* Not needed.  */
diff --git a/sysdeps/ieee754/flt-32/e_powf.c b/sysdeps/ieee754/flt-32/e_powf.c
index ce8e11f1ea..644a18d05e 100644
--- a/sysdeps/ieee754/flt-32/e_powf.c
+++ b/sysdeps/ieee754/flt-32/e_powf.c
@@ -1,7 +1,5 @@ 
-/* e_powf.c -- float version of e_pow.c.
- * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com.
- */
-/* Copyright (C) 2017 Free Software Foundation, Inc.
+/* Single-precision pow function.
+   Copyright (C) 2017 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,210 +16,202 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-
 #include <math.h>
-#include <math_private.h>
-
-static const float huge = 1.0e+30, tiny = 1.0e-30;
-
-static const float
-bp[] = {1.0, 1.5,},
-zero    =  0.0,
-one	=  1.0,
-two	=  2.0,
-two24	=  16777216.0,	/* 0x4b800000 */
-	/* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */
-L1  =  6.0000002384e-01, /* 0x3f19999a */
-L2  =  4.2857143283e-01, /* 0x3edb6db7 */
-L3  =  3.3333334327e-01, /* 0x3eaaaaab */
-L4  =  2.7272811532e-01, /* 0x3e8ba305 */
-L5  =  2.3066075146e-01, /* 0x3e6c3255 */
-L6  =  2.0697501302e-01, /* 0x3e53f142 */
-P1   =  1.6666667163e-01, /* 0x3e2aaaab */
-P2   = -2.7777778450e-03, /* 0xbb360b61 */
-P3   =  6.6137559770e-05, /* 0x388ab355 */
-P4   = -1.6533901999e-06, /* 0xb5ddea0e */
-P5   =  4.1381369442e-08, /* 0x3331bb4c */
-ovt =  4.2995665694e-08; /* -(128-log2(ovfl+.5ulp)) */
-
-static const double
-	dp[] = { 0.0, 0x1.2b803473f7ad1p-1, }, /* log2(1.5) */
-	lg2 = M_LN2,
-	cp = 2.0/3.0/M_LN2,
-	invln2 = 1.0/M_LN2;
+#include <stdint.h>
+#include "math_config.h"
 
-float
-__ieee754_powf(float x, float y)
+/*
+POWF_LOG2_POLY_ORDER = 5
+EXP2F_TABLE_BITS = 5
+
+ULP error: 0.82 (~ 0.5 + relerr*2^24)
+relerr: 1.27 * 2^-26 (Relative error ~= 128*Ln2*relerr_log2 + relerr_exp2)
+relerr_log2: 1.83 * 2^-33 (Relative error of logx.)
+relerr_exp2: 1.69 * 2^-34 (Relative error of exp2(ylogx).)
+*/
+
+#define N (1 << POWF_LOG2_TABLE_BITS)
+#define T __powf_log2_data.tab
+#define A __powf_log2_data.poly
+#define OFF 0x3f330000
+
+/* Subnormal input is normalized so ix has negative biased exponent.
+   Output is multiplied by N (POWF_SCALE) if TOINT_INTRINICS is set.  */
+static inline double_t
+log2_inline (uint32_t ix)
 {
-	float z, ax, s;
-	double d1, d2;
-	int32_t i,j,k,yisint,n;
-	int32_t hx,hy,ix,iy;
-
-	GET_FLOAT_WORD(hy,y);
-	iy = hy&0x7fffffff;
-
-    /* y==zero: x**0 = 1 */
-	if(iy==0 && !issignaling (x)) return one;
-
-    /* x==+-1 */
-	if(x == 1.0 && !issignaling (y)) return one;
-	if(x == -1.0 && isinf(y)) return one;
-
-	GET_FLOAT_WORD(hx,x);
-	ix = hx&0x7fffffff;
-
-    /* +-NaN return x+y */
-	if(__builtin_expect(ix > 0x7f800000 ||
-			    iy > 0x7f800000, 0))
-		return x+y;
-
-    /* special value of y */
-	if (__builtin_expect(iy==0x7f800000, 0)) {	/* y is +-inf */
-	    if (ix==0x3f800000)
-		return  y - y;	/* inf**+-1 is NaN */
-	    else if (ix > 0x3f800000)/* (|x|>1)**+-inf = inf,0 */
-		return (hy>=0)? y: zero;
-	    else			/* (|x|<1)**-,+inf = inf,0 */
-		return (hy<0)?-y: zero;
-	}
-	if(iy==0x3f800000) {	/* y is  +-1 */
-	    if(hy<0) return one/x; else return x;
-	}
-	if(hy==0x40000000) return x*x; /* y is  2 */
-	if(hy==0x3f000000) {	/* y is  0.5 */
-	    if(__builtin_expect(hx>=0, 1))	/* x >= +0 */
-	    return __ieee754_sqrtf(x);
-	}
+  /* double_t for better performance on targets with FLT_EVAL_METHOD==2.  */
+  double_t z, r, r2, r4, p, q, y, y0, invc, logc;
+  uint32_t iz, top, tmp;
+  int k, i;
+
+  /* x = 2^k z; where z is in range [OFF,2*OFF] and exact.
+     The range is split into N subintervals.
+     The ith subinterval contains z and c is near its center.  */
+  tmp = ix - OFF;
+  i = (tmp >> (23 - POWF_LOG2_TABLE_BITS)) % N;
+  top = tmp & 0xff800000;
+  iz = ix - top;
+  k = (int32_t) top >> (23 - POWF_SCALE_BITS); /* arithmetic shift */
+  invc = T[i].invc;
+  logc = T[i].logc;
+  z = (double_t) asfloat (iz);
+
+  /* log2(x) = log1p(z/c-1)/ln2 + log2(c) + k */
+  r = z * invc - 1;
+  y0 = logc + (double_t) k;
+
+  /* Pipelined polynomial evaluation to approximate log1p(r)/ln2.  */
+  r2 = r * r;
+  y = A[0] * r + A[1];
+  p = A[2] * r + A[3];
+  r4 = r2 * r2;
+  q = A[4] * r + y0;
+  q = p * r2 + q;
+  y = y * r4 + q;
+  return y;
+}
 
-    /* determine if y is an odd int when x < 0
-     * yisint = 0	... y is not an integer
-     * yisint = 1	... y is an odd int
-     * yisint = 2	... y is an even int
-     */
-	yisint  = 0;
-	if(hx<0) {
-	    if(iy>=0x4b800000) yisint = 2; /* even integer y */
-	    else if(iy>=0x3f800000) {
-		k = (iy>>23)-0x7f;	   /* exponent */
-		j = iy>>(23-k);
-		if((j<<(23-k))==iy) yisint = 2-(j&1);
-	    }
-	}
+#undef N
+#undef T
+#define N (1 << EXP2F_TABLE_BITS)
+#define T __exp2f_data.tab
+#define SIGN_BIAS (1 << (EXP2F_TABLE_BITS + 11))
+
+/* The output of log2 and thus the input of exp2 is either scaled by N
+   (in case of fast toint intrinsics) or not.  The unscaled xd must be
+   in [-1021,1023], sign_bias sets the sign of the result.  */
+static inline double_t
+exp2_inline (double_t xd, unsigned long sign_bias)
+{
+  uint64_t ki, ski, t;
+  /* double_t for better performance on targets with FLT_EVAL_METHOD==2.  */
+  double_t kd, z, r, r2, y, s;
+
+#if TOINT_INTRINSICS
+# define C __exp2f_data.poly_scaled
+  /* N*x = k + r with r in [-1/2, 1/2] */
+  kd = roundtoint (xd); /* k */
+  ki = converttoint (xd);
+#else
+# define C __exp2f_data.poly
+# define SHIFT __exp2f_data.shift_scaled
+  /* x = k/N + r with r in [-1/(2N), 1/(2N)] */
+  kd = (double) (xd + SHIFT); /* Rounding to double precision is required.  */
+  ki = asuint64 (kd);
+  kd -= SHIFT; /* k/N */
+#endif
+  r = xd - kd;
+
+  /* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */
+  t = T[ki % N];
+  ski = ki + sign_bias;
+  t += ski << (52 - EXP2F_TABLE_BITS);
+  s = asdouble (t);
+  z = C[0] * r + C[1];
+  r2 = r * r;
+  y = C[2] * r + 1;
+  y = z * r2 + y;
+  y = y * s;
+  return y;
+}
 
-	ax   = fabsf(x);
-    /* special value of x */
-	if(__builtin_expect(ix==0x7f800000||ix==0||ix==0x3f800000, 0)){
-	    z = ax;			/*x is +-0,+-inf,+-1*/
-	    if(hy<0) z = one/z;	/* z = (1/|x|) */
-	    if(hx<0) {
-		if(((ix-0x3f800000)|yisint)==0) {
-		    z = (z-z)/(z-z); /* (-1)**non-int is NaN */
-		} else if(yisint==1)
-		    z = -z;		/* (x<0)**odd = -(|x|**odd) */
-	    }
-	    return z;
-	}
+/* Returns 0 if not int, 1 if odd int, 2 if even int.  */
+static inline int
+checkint (uint32_t iy)
+{
+  int e = iy >> 23 & 0xff;
+  if (e < 0x7f)
+    return 0;
+  if (e > 0x7f + 23)
+    return 2;
+  if (iy & ((1 << (0x7f + 23 - e)) - 1))
+    return 0;
+  if (iy & (1 << (0x7f + 23 - e)))
+    return 1;
+  return 2;
+}
 
-    /* (x<0)**(non-int) is NaN */
-	if(__builtin_expect(((((uint32_t)hx>>31)-1)|yisint)==0, 0))
-	    return (x-x)/(x-x);
-
-    /* |y| is huge */
-	if(__builtin_expect(iy>0x4d000000, 0)) { /* if |y| > 2**27 */
-	/* over/underflow if x is not close to one */
-	    if(ix<0x3f7ffff8) return (hy<0)? huge*huge:tiny*tiny;
-	    if(ix>0x3f800007) return (hy>0)? huge*huge:tiny*tiny;
-	/* now |1-x| is tiny <= 2**-20, suffice to compute
-	   log(x) by x-x^2/2+x^3/3-x^4/4 */
-	    d2 = ax-1;		/* d2 has 20 trailing zeros.  */
-	    d2 = d2 * invln2 -
-		 (d2 * d2) * (0.5 - d2 * (0.333333333333 - d2 * 0.25)) * invln2;
-	} else {
-	    /* Avoid internal underflow for tiny y.  The exact value
-	       of y does not matter if |y| <= 2**-32.  */
-	    if (iy < 0x2f800000)
-	      SET_FLOAT_WORD (y, (hy & 0x80000000) | 0x2f800000);
-	    n = 0;
-	/* take care subnormal number */
-	    if(ix<0x00800000)
-		{ax *= two24; n -= 24; GET_FLOAT_WORD(ix,ax); }
-	    n  += ((ix)>>23)-0x7f;
-	    j  = ix&0x007fffff;
-	/* determine interval */
-	    ix = j|0x3f800000;		/* normalize ix */
-	    if(j<=0x1cc471) k=0;	/* |x|<sqrt(3/2) */
-	    else if(j<0x5db3d7) k=1;	/* |x|<sqrt(3)   */
-	    else {k=0;n+=1;ix -= 0x00800000;}
-	    SET_FLOAT_WORD(ax,ix);
-
-	/* compute d1 = (x-1)/(x+1) or (x-1.5)/(x+1.5) */
-	    d1 = (ax-(double)bp[k])/(ax+(double)bp[k]);
-	/* compute d2 = log(ax) */
-	    d2 = d1 * d1;
-	    d2 = 3.0 + d2 + d2*d2*(L1+d2*(L2+d2*(L3+d2*(L4+d2*(L5+d2*L6)))));
-	/* 2/(3log2)*(d2+...) */
-	    d2 = d1*d2*cp;
-	/* log2(ax) = (d2+..)*2/(3*log2) */
-	    d2 = d2+dp[k]+(double)n;
-	}
+static inline int
+zeroinfnan (uint32_t ix)
+{
+  return 2 * ix - 1 >= 2u * 0x7f800000 - 1;
+}
 
-	s = one; /* s (sign of result -ve**odd) = -1 else = 1 */
-	if(((((uint32_t)hx>>31)-1)|(yisint-1))==0)
-	    s = -one;	/* (-ve)**(odd int) */
-
-    /* compute y * d2 */
-	d1 = y * d2;
-	z = d1;
-	GET_FLOAT_WORD(j,z);
-	if (__builtin_expect(j>0x43000000, 0))		/* if z > 128 */
-	    return s*huge*huge;				/* overflow */
-	else if (__builtin_expect(j==0x43000000, 0)) {	/* if z == 128 */
-	    if(ovt>(z-d1)) return s*huge*huge;	/* overflow */
+float
+__ieee754_powf (float x, float y)
+{
+  unsigned long sign_bias = 0;
+  uint32_t ix, iy;
+
+  ix = asuint (x);
+  iy = asuint (y);
+  if (__glibc_unlikely (ix - 0x00800000 >= 0x7f800000 - 0x00800000
+			|| zeroinfnan (iy)))
+    {
+      /* Either (x < 0x1p-126 or inf or nan) or (y is 0 or inf or nan).  */
+      if (__glibc_unlikely (zeroinfnan (iy)))
+	{
+	  if (2 * iy == 0)
+	    return issignalingf_inline (x) ? x + y : 1.0f;
+	  if (ix == 0x3f800000)
+	    return issignalingf_inline (y) ? x + y : 1.0f;
+	  if (2 * ix > 2u * 0x7f800000 || 2 * iy > 2u * 0x7f800000)
+	    return x + y;
+	  if (2 * ix == 2 * 0x3f800000)
+	    return 1.0f;
+	  if ((2 * ix < 2 * 0x3f800000) == !(iy & 0x80000000))
+	    return 0.0f; /* |x|<1 && y==inf or |x|>1 && y==-inf.  */
+	  return y * y;
+	}
+      if (__glibc_unlikely (zeroinfnan (ix)))
+	{
+	  float_t x2 = x * x;
+	  if (ix & 0x80000000 && checkint (iy) == 1)
+	    {
+	      x2 = -x2;
+	      sign_bias = 1;
+	    }
+#if WANT_ERRNO
+	  if (2 * ix == 0 && iy & 0x80000000)
+	    return __math_divzerof (sign_bias);
+#endif
+	  return iy & 0x80000000 ? 1 / x2 : x2;
 	}
-	else if (__builtin_expect((j&0x7fffffff)>0x43160000, 0))/* z <= -150 */
-	    return s*tiny*tiny;				/* underflow */
-	else if (__builtin_expect((uint32_t) j==0xc3160000, 0)){/* z == -150*/
-	    if(0.0<=(z-d1)) return s*tiny*tiny;		/* underflow */
+      /* x and y are non-zero finite.  */
+      if (ix & 0x80000000)
+	{
+	  /* Finite x < 0.  */
+	  int yint = checkint (iy);
+	  if (yint == 0)
+	    return __math_invalidf (x);
+	  if (yint == 1)
+	    sign_bias = SIGN_BIAS;
+	  ix &= 0x7fffffff;
 	}
-    /*
-     * compute 2**d1
-     */
-	i = j&0x7fffffff;
-	k = (i>>23)-0x7f;
-	n = 0;
-	if(i>0x3f000000) {		/* if |z| > 0.5, set n = [z+0.5] */
-	    n = j+(0x00800000>>(k+1));
-	    k = ((n&0x7fffffff)>>23)-0x7f;	/* new k for n */
-	    SET_FLOAT_WORD(z,n&~(0x007fffff>>k));
-	    n = ((n&0x007fffff)|0x00800000)>>(23-k);
-	    if(j<0) n = -n;
-	    d1 -= z;
+      if (ix < 0x00800000)
+	{
+	  /* Normalize subnormal x so exponent becomes negative.  */
+	  ix = asuint (x * 0x1p23f);
+	  ix &= 0x7fffffff;
+	  ix -= 23 << 23;
 	}
-	d1 = d1 * lg2;
-	d2 = d1*d1;
-	d2 = d1 - d2*(P1+d2*(P2+d2*(P3+d2*(P4+d2*P5))));
-	d2 = (d1*d2)/(d2-two);
-	z = one - (d2-d1);
-	GET_FLOAT_WORD(j,z);
-	j += (n<<23);
-	if((j>>23)<=0)	/* subnormal output */
-	  {
-	    z = __scalbnf (z, n);
-	    float force_underflow = z * z;
-	    math_force_eval (force_underflow);
-	  }
-	else SET_FLOAT_WORD(z,j);
-	return s*z;
+    }
+  double_t logx = log2_inline (ix);
+  double_t ylogx = y * logx; /* Note: cannot overflow, y is single prec.  */
+  if (__glibc_unlikely ((asuint64 (ylogx) >> 47 & 0xffff)
+			>= asuint64 (126.0 * POWF_SCALE) >> 47))
+    {
+      /* |y*log(x)| >= 126.  */
+      if (ylogx > 0x1.fffffffd1d571p+6 * POWF_SCALE)
+	return __math_oflowf (sign_bias);
+      if (ylogx <= -150.0 * POWF_SCALE)
+	return __math_uflowf (sign_bias);
+#if WANT_ERRNO_UFLOW
+      if (ylogx < -149.0 * POWF_SCALE)
+	return __math_may_uflowf (sign_bias);
+#endif
+    }
+  return (float) exp2_inline (ylogx, sign_bias);
 }
 strong_alias (__ieee754_powf, __powf_finite)
diff --git a/sysdeps/ieee754/flt-32/e_powf_log2_data.c b/sysdeps/ieee754/flt-32/e_powf_log2_data.c
new file mode 100644
index 0000000000..7cff06f59b
--- /dev/null
+++ b/sysdeps/ieee754/flt-32/e_powf_log2_data.c
@@ -0,0 +1,45 @@ 
+/* Data definition for powf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "math_config.h"
+
+const struct powf_log2_data __powf_log2_data = {
+  .tab = {
+  { 0x1.661ec79f8f3bep+0, -0x1.efec65b963019p-2 * POWF_SCALE },
+  { 0x1.571ed4aaf883dp+0, -0x1.b0b6832d4fca4p-2 * POWF_SCALE },
+  { 0x1.49539f0f010bp+0, -0x1.7418b0a1fb77bp-2 * POWF_SCALE },
+  { 0x1.3c995b0b80385p+0, -0x1.39de91a6dcf7bp-2 * POWF_SCALE },
+  { 0x1.30d190c8864a5p+0, -0x1.01d9bf3f2b631p-2 * POWF_SCALE },
+  { 0x1.25e227b0b8eap+0, -0x1.97c1d1b3b7afp-3 * POWF_SCALE },
+  { 0x1.1bb4a4a1a343fp+0, -0x1.2f9e393af3c9fp-3 * POWF_SCALE },
+  { 0x1.12358f08ae5bap+0, -0x1.960cbbf788d5cp-4 * POWF_SCALE },
+  { 0x1.0953f419900a7p+0, -0x1.a6f9db6475fcep-5 * POWF_SCALE },
+  { 0x1p+0, 0x0p+0 * POWF_SCALE },
+  { 0x1.e608cfd9a47acp-1, 0x1.338ca9f24f53dp-4 * POWF_SCALE },
+  { 0x1.ca4b31f026aap-1, 0x1.476a9543891bap-3 * POWF_SCALE },
+  { 0x1.b2036576afce6p-1, 0x1.e840b4ac4e4d2p-3 * POWF_SCALE },
+  { 0x1.9c2d163a1aa2dp-1, 0x1.40645f0c6651cp-2 * POWF_SCALE },
+  { 0x1.886e6037841edp-1, 0x1.88e9c2c1b9ff8p-2 * POWF_SCALE },
+  { 0x1.767dcf5534862p-1, 0x1.ce0a44eb17bccp-2 * POWF_SCALE },
+  },
+  .poly = {
+  0x1.27616c9496e0bp-2 * POWF_SCALE, -0x1.71969a075c67ap-2 * POWF_SCALE,
+  0x1.ec70a6ca7baddp-2 * POWF_SCALE, -0x1.7154748bef6c8p-1 * POWF_SCALE,
+  0x1.71547652ab82bp0 * POWF_SCALE,
+  }
+};
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index f869fbc66c..7e78cb0c96 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -21,6 +21,7 @@ 
 
 #include <math.h>
 #include <math_private.h>
+#include <nan-high-order-bit.h>
 #include <stdint.h>
 
 #ifndef WANT_ROUNDING
@@ -90,6 +91,15 @@  asdouble (uint64_t i)
   return u.f;
 }
 
+static inline int
+issignalingf_inline (float x)
+{
+  uint32_t ix = asuint (x);
+  if (HIGH_ORDER_BIT_IS_SET_FOR_SNAN)
+    return (ix & 0x7fc00000) == 0x7fc00000;
+  return 2 * (ix ^ 0x00400000) > 2u * 0x7fc00000;
+}
+
 #define NOINLINE __attribute__ ((noinline))
 
 attribute_hidden float __math_oflowf (unsigned long);
@@ -134,4 +144,21 @@  extern const struct log2f_data
   double poly[LOG2F_POLY_ORDER];
 } __log2f_data attribute_hidden;
 
+#define POWF_LOG2_TABLE_BITS 4
+#define POWF_LOG2_POLY_ORDER 5
+#if TOINT_INTRINSICS
+#define POWF_SCALE_BITS EXP2F_TABLE_BITS
+#else
+#define POWF_SCALE_BITS 0
+#endif
+#define POWF_SCALE ((double) (1 << POWF_SCALE_BITS))
+extern const struct powf_log2_data
+{
+  struct
+  {
+    double invc, logc;
+  } tab[1 << POWF_LOG2_TABLE_BITS];
+  double poly[POWF_LOG2_POLY_ORDER];
+} __powf_log2_data attribute_hidden;
+
 #endif
diff --git a/sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c b/sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c
new file mode 100644
index 0000000000..1cc8931700
--- /dev/null
+++ b/sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c
@@ -0,0 +1 @@ 
+/* Not needed.  */
-- 
2.11.0