From patchwork Thu Oct 20 23:26:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Myers X-Patchwork-Id: 16714 Received: (qmail 59918 invoked by alias); 20 Oct 2016 23:26:48 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 59867 invoked by uid 89); 20 Oct 2016 23:26:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=roots, wildly, Hx-languages-length:4473 X-HELO: relay1.mentorg.com Date: Thu, 20 Oct 2016 23:26:29 +0000 From: Joseph Myers To: Subject: Use VSQRT instruction for ARM sqrt (bug 20660) [committed] Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) This patch makes ARM sqrt and sqrtf use the VSQRT VFP square root instruction when available, instead of much larger generic code for computing square roots. Now, GCC will normally inline sqrt calls except for negative arguments where errno needs to be set, and because the benchtests fail to use -fno-builtin that means no significant difference in benchmark results for sqrt (note, however, there are lots of __ieee754_sqrt calls internally in libm, which are *not* inlined - although some architectures define __ieee754_sqrt in their math_private.h for that purpose, ARM doesn't - so improving out-of-line sqrt performance is still relevant to those other functions, if not for most ordinary direct users of sqrt). With the benchtests changed to use -fno-builtin for sqrt tests, typical performance results before the change are ("max" is wildly varying in any case): "duration": 9.88358e+09, "iterations": 4.8783e+07, "max": 457.764, "min": 183.105, "mean": 202.603 and after it are: "duration": 9.45663e+09, "iterations": 2.24385e+08, "max": 274.659, "min": 30.517, "mean": 42.1447 Tested for ARM (hard-float and soft-float). Committed. 2016-10-20 Joseph Myers [BZ #20660] * sysdeps/arm/e_sqrt.c: New file. * sysdeps/arm/e_sqrtf.c: Likewise. diff --git a/sysdeps/arm/e_sqrt.c b/sysdeps/arm/e_sqrt.c new file mode 100644 index 0000000..8ba85ed --- /dev/null +++ b/sysdeps/arm/e_sqrt.c @@ -0,0 +1,45 @@ +/* Compute square root for double. ARM version. + Copyright (C) 2016 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifdef __SOFTFP__ + +/* Use architecture-indendent sqrt implementation. */ +# include + +#else + +/* Use VFP square root instruction. */ +# include +# include + +double +__ieee754_sqrt (double x) +{ + double ret; +# if __ARM_ARCH >= 6 + asm ("vsqrt.f64 %P0, %P1" : "=w" (ret) : "w" (x)); +# else + /* As in GCC, for VFP9 Erratum 760019 avoid overwriting the + input. */ + asm ("vsqrt.f64 %P0, %P1" : "=&w" (ret) : "w" (x)); +# endif + return ret; +} +strong_alias (__ieee754_sqrt, __sqrt_finite) + +#endif diff --git a/sysdeps/arm/e_sqrtf.c b/sysdeps/arm/e_sqrtf.c new file mode 100644 index 0000000..282f7c3 --- /dev/null +++ b/sysdeps/arm/e_sqrtf.c @@ -0,0 +1,45 @@ +/* Compute square root for float. ARM version. + Copyright (C) 2016 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifdef __SOFTFP__ + +/* Use architecture-indendent sqrtf implementation. */ +# include + +#else + +/* Use VFP square root instruction. */ +# include +# include + +float +__ieee754_sqrtf (float x) +{ + float ret; +# if __ARM_ARCH >= 6 + asm ("vsqrt.f32 %0, %1" : "=t" (ret) : "t" (x)); +# else + /* As in GCC, for VFP9 Erratum 760019 avoid overwriting the + input. */ + asm ("vsqrt.f32 %0, %1" : "=&t" (ret) : "t" (x)); +# endif + return ret; +} +strong_alias (__ieee754_sqrtf, __sqrtf_finite) + +#endif