From patchwork Wed May 22 16:54:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Ellcey X-Patchwork-Id: 32798 Received: (qmail 66878 invoked by alias); 22 May 2019 16:54:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 66869 invoked by uid 89); 22 May 2019 16:54:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=6* X-HELO: mx0b-0016f401.pphosted.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : subject : date : message-id : content-type : mime-version; s=pfpt0818; bh=izOcCb95CMeRLZdUMHncxqr015i7A+PKNFlcNj8gemY=; b=sEXq9f9Ku8AsQlOw6p04k9DThY/B23T6gc6qJy5wEEPCrm3jOcC/PN6xNUo0XYV5VwJj QsfByHMFW3vOYyaQNuh3NCSbHVwHFUFT4AV2ZG6aZo8b1/5uWkm/qB+OLIosxeyqQYHh o/m0rVFAFqXX+gtPKUlq0xkIqB3Sez2YTbuCbekxT5Tu0oj6rg1l1z37pyPJFHMIj6Lf Acv+u4eBJPLIHL/o2Q4MjT2Sa8l68eTHPdcqc3kBBwJVEfsMW6z+j5C7juKSrwKV+1dc Y5wK8FwxuivV/3MJPMDxPmvzZHqjdoA/kyNO73kZgT69eBIYMY3DkmqpdHNoBtuPtsjc hw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=izOcCb95CMeRLZdUMHncxqr015i7A+PKNFlcNj8gemY=; b=N/QVyST27QxxkehnMwI15M7MgDpEfBjbIaMEM71bL7SLMGJNlC82m1L/mK7o6DQUUKoRjF5S//X/e+U+BNnNJF/Lwlb/6MRwiFA6qqohani2FYma+7YizNOP8nACm9qAQZB3iqJ05F+tId0lArZgFhvVRsHFg1YvjkUXAYoG7l4= From: Steve Ellcey To: "libc-alpha@sourceware.org" Subject: [PATCH v2] Aarch64: Add simd exp/expf functions Date: Wed, 22 May 2019 16:54:48 +0000 Message-ID: x-ms-exchange-purlcount: 1 x-ms-oob-tlc-oobclassifiers: OLM:2331; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: sellcey@marvell.com Here is an updated version of my patch to add libmvec and vector exp functions to Aarch64. GCC 9.1 has now been released so the build no longer depends on an unreleased compiler. I have not added any assembly trampolines to allow older compilers to be used. The SIMD ABI that this patch uses is supported by GCC 9.1 and is defined at: https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi If you build with GCC 9.1 (or any compiler that supports the aarch64_vector_pcs attribute) you will get libmvec by default, otherwise you will not. If you try to build libmvec using a compiler with out aarch64_vector_pcs support the configure will fail. There was a question of whether building libmvec should be optional or not, I don't have a strong opinion on that but would be interested in what others think. I could change this to require aarch64_vector_pcs attribute support in all cases and always build libmvec if that is what we want. I added static *_finite function names so that they are not exported. If Wilco's patch to remove the *_finite names entirely is approved I can remove this part of the patch. I removed the 'if (aarch64)' conditionals from math-vector-fortran.h (and fixed my use of the BIG_ENDIAN macros) this means that the vector exp and expf routines should get used in big-endian and little-endian modes for Fortran (just like C). I have not done any big-endian testing at this point because I have been doing all my testing on a little-endian Aarch64 linux box. If anyone has ideas on how to do big-endian testing I would be interested. I am guessing I would have to build an elf target and test with qemu or something like that but I haven't done a build/test setup like that in quite a while. Steve Ellcey sellcey@marvell.com 2019-05-22 Steve Ellcey * NEWS: Add entry about libmvec support on aarch64. * sysdeps/aarch64/configure.ac (build_mathvec): Check for ABI support, Build libmvec if support exists. * sysdeps/aarch64/configure: Regenerate. * sysdeps/aarch64/fpu/Makefile (CFLAGS-libmvec_double_vlen2_exp.c): Set flag. (CFLAGS-libmvec_float_vlen4_expf.c): Likewise. (CFLAGS-libmvec_exp_data.c): Likewise. (CFLAGS-libmvec_exp2f_data.c): Likewise. (libmvec-support): Add libmvec_double_vlen2_exp, libmvec_float_vlen4_expf, libmvec_exp_data, libmvec_exp2f_data, libmvec_aliases to list. (libmvec-static-only-routines): Add libmvec_aliases to list. (libmvec-tests): Add double-vlen2, float-vlen4 to list. (double-vlen2-funcs): Add new vector function name. (float-vlen4-funcs): Add new vector function name. * sysdeps/aarch64/fpu/Versions: New file. * sysdeps/aarch64/fpu/bits/math-vector.h: New file. * sysdeps/aarch64/fpu/finclude/math-vector-fortran.h: New file. * sysdeps/aarch64/fpu/libmvec_aliases.c: New file. * sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c: New file. * sysdeps/aarch64/fpu/libmvec_exp2f_data.c: New file. * sysdeps/aarch64/fpu/libmvec_exp_data.c: New file. * sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c: New file. * sysdeps/aarch64/fpu/libmvec_util.h: New file. * sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c: New file. * sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c: New file. * sysdeps/aarch64/libm-test-ulps (exp_vlen2): New entry. (exp_vlen4): Likewise. * sysdeps/unix/sysv/linux/aarch64/libmvec.abilist: New file. diff --git a/NEWS b/NEWS index 0e4c57f273..b1845c02e4 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Version 2.30 Major new features: +* Aarch64 now supports libmvec. Building libmvec on aarch64 requires + a compiler that supports the vector function ABI that is defined at + https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi + + GCC 9.1 has support for this ABI. The current libmvec for aarch64 + has vector versions of the exp and expf functions. + * Unicode 12.1.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 12.1.0, using generator scripts contributed by Mike FABIAN (Red Hat). diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac index 7851dd4dac..5c56511deb 100644 --- a/sysdeps/aarch64/configure.ac +++ b/sysdeps/aarch64/configure.ac @@ -20,3 +20,27 @@ if test $libc_cv_aarch64_be = yes; then else LIBC_CONFIG_VAR([default-abi], [lp64]) fi + +AC_CACHE_CHECK([for pcs attribute support], + libc_cv_gcc_pcs_attribute, [dnl +cat > conftest.c <&AS_MESSAGE_LOG_FD \ + 2>&AS_MESSAGE_LOG_FD ; then + libc_cv_gcc_pcs_attribute=yes +fi +rm -f conftest*]) + +if test x"$build_mathvec" = xyes; then + if test $libc_cv_gcc_pcs_attribute = no; then + AC_MSG_ERROR([--enable-mathvec requires a gcc that supports the aarch64_vector_pcs attribute]) + fi +fi + +if test x"$build_mathvec" = xnotset; then + if test $libc_cv_gcc_pcs_attribute = yes; then + build_mathvec=yes + fi +fi diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 4a182bd6d6..c0720484e2 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -12,3 +12,27 @@ CFLAGS-s_fmaxf.c += -ffinite-math-only CFLAGS-s_fmin.c += -ffinite-math-only CFLAGS-s_fminf.c += -ffinite-math-only endif + +ifeq ($(subdir),mathvec) +CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_aliases.c += -march=armv8-a+simd -fno-math-errno + +libmvec-support += libmvec_double_vlen2_exp +libmvec-support += libmvec_float_vlen4_expf +libmvec-support += libmvec_exp_data +libmvec-support += libmvec_exp2f_data +libmvec-support += libmvec_aliases + +libmvec-static-only-routines += limvec_aliases +endif + +ifeq ($(subdir),math) +ifeq ($(build-mathvec),yes) +libmvec-tests += double-vlen2 float-vlen4 +double-vlen2-funcs = exp +float-vlen4-funcs = exp +endif +endif diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index e69de29bb2..da36f3c495 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -0,0 +1,5 @@ +libmvec { + GLIBC_2.30 { + _ZGVnN2v_exp; _ZGVnN4v_expf; + } +} diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index e69de29bb2..4c3415987a 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -0,0 +1,43 @@ +/* Platform-specific SIMD declarations of math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _MATH_H +# error "Never include directly;\ + include instead." +#endif + +/* Get default empty definitions for simd declarations. */ +#include + +#if defined __FAST_MATH__ +# if defined _OPENMP && _OPENMP >= 201307 +/* OpenMP case. */ +# define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch") +# elif __GNUC_PREREQ (6,0) +/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)). */ +# define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch"))) +# endif + +# ifdef __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_exp +# define __DECL_SIMD_exp __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_expf +# define __DECL_SIMD_expf __DECL_SIMD_AARCH64 + +# endif +#endif diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h index e69de29bb2..293983eb2c 100644 --- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h @@ -0,0 +1,20 @@ +! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*- +! Copyright (C) 2019 Free Software Foundation, Inc. +! This file is part of the GNU C Library. +! +! The GNU C Library is free software; you can redistribute it and/or +! modify it under the terms of the GNU Lesser General Public +! License as published by the Free Software Foundation; either +! version 2.1 of the License, or (at your option) any later version. +! +! The GNU C Library is distributed in the hope that it will be useful, +! but WITHOUT ANY WARRANTY; without even the implied warranty of +! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +! Lesser General Public License for more details. +! +! You should have received a copy of the GNU Lesser General Public +! License along with the GNU C Library; if not, see +! . + +!GCC$ builtin (exp) attributes simd (notinbranch) +!GCC$ builtin (expf) attributes simd (notinbranch) diff --git a/sysdeps/aarch64/fpu/libmvec_aliases.c b/sysdeps/aarch64/fpu/libmvec_aliases.c index e69de29bb2..bc3f9b8118 100644 --- a/sysdeps/aarch64/fpu/libmvec_aliases.c +++ b/sysdeps/aarch64/fpu/libmvec_aliases.c @@ -0,0 +1,40 @@ +/* These aliases added as workaround to exclude unnecessary symbol + aliases in libmvec.so while compiler creates the vector names + based on scalar asm name. Corresponding discussion is at + . + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +extern __attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v_exp (float64x2_t x); + +__attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v___exp_finite (float64x2_t x) +{ + return _ZGVnN2v_exp (x); +} + +extern __attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v_expf (float32x4_t x); + +__attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v___expf_finite (float32x4_t x) +{ + return _ZGVnN4v_expf (x); +} diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c index e69de29bb2..ce618c8859 100644 --- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c +++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c @@ -0,0 +1,94 @@ +/* Double-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c. */ + +#include +#include +#include +#include +#include +#include +#include "math_config.h" +#include "libmvec_util.h" + +#define N (1 << EXP_TABLE_BITS) +#define InvLn2N __exp_data.invln2N +#define NegLn2hiN __exp_data.negln2hiN +#define NegLn2loN __exp_data.negln2loN +#define Shift __exp_data.shift +#define T __exp_data.tab +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] + +#define LIMIT 700.0 + +/* Do not inline this call. That way _ZGVnN2v_exp has no calls to non-vector + functions. This reduces the register saves that _ZGVnN2v_exp has to do. */ + +__attribute__((aarch64_vector_pcs, noinline)) static float64x2_t +__scalar_exp (float64x2_t x) +{ + return (float64x2_t) { exp(x[0]), exp(x[1]) }; +} + +__attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v_exp (float64x2_t x) +{ + double h, z_0, z_1; + float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v; + float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v; + uint64_t ki_0, ki_1, idx_0, idx_1; + uint64_t top_0, top_1, sbits_0, sbits_1; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = vabsq_f64 (x); + h = vmaxnmvq_f64 (g); + if (__glibc_unlikely (!(h < LIMIT))) + return __scalar_exp (x); + + z_0 = InvLn2N * x[0]; + z_1 = InvLn2N * x[1]; + ki_0 = converttoint (z_0); + ki_1 = converttoint (z_1); + + idx_0 = 2 * (ki_0 % N); + idx_1 = 2 * (ki_1 % N); + top_0 = ki_0 << (52 - EXP_TABLE_BITS); + top_1 = ki_1 << (52 - EXP_TABLE_BITS); + sbits_0 = T[idx_0 + 1] + top_0; + sbits_1 = T[idx_1 + 1] + top_1; + + kd_v = (float64x2_t) { roundtoint (z_0), roundtoint (z_1) }; + scale_v = (float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) }; + tail_v = (float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) }; + NegLn2hiN_v = (float64x2_t) { NegLn2hiN, NegLn2hiN }; + NegLn2loN_v = (float64x2_t) { NegLn2loN, NegLn2loN }; + C2_v = (float64x2_t) { C2, C2 }; + C3_v = (float64x2_t) { C3, C3 }; + C4_v = (float64x2_t) { C4, C4 }; + C5_v = (float64x2_t) { C5, C5 }; + + r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v; + r2_v = r_v * r_v; + tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v + * (C4_v + r_v * C5_v); + return scale_v + scale_v * tmp_v; +} diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c index e69de29bb2..d97ce157b0 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c @@ -0,0 +1,2 @@ +#include +#include diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c index e69de29bb2..a83661b39d 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c @@ -0,0 +1 @@ +#include diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c index e69de29bb2..938c72ddab 100644 --- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c +++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c @@ -0,0 +1,114 @@ +/* Single-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This function is based on sysdeps/ieee754/flt-32/e_expf.c. */ + +#include +#include +#include +#include +#include "libmvec_util.h" + +#define N (1 << EXP2F_TABLE_BITS) +#define LIMIT 80.0 + +#define InvLn2N __exp2f_data.invln2_scaled +#define T __exp2f_data.tab +#define C __exp2f_data.poly_scaled +#define SHIFT __exp2f_data.shift + +/* Do not inline this call. That way _ZGVnN4v_expf has no calls to non-vector + functions. This reduces the register saves that _ZGVnN4v_expf has to do. */ + +__attribute__((aarch64_vector_pcs,noinline)) static float32x4_t +__scalar_expf (float32x4_t x) +{ + return (float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) }; +} + +__attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v_expf (float32x4_t x) +{ + float32x4_t g, result; + float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1; + float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one; + uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3; + double s_0, s_1, s_2, s_3; + float f; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = vabsq_f32 (x); + f = vmaxnmvq_f32 (g); + if (__glibc_unlikely (!(f < LIMIT))) + return __scalar_expf (x); + + xd_0 = get_lo_and_extend (x); + xd_1 = get_hi_and_extend (x); + + vInvLn2N = (float64x2_t) { InvLn2N, InvLn2N }; + /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k. */ + z_0 = vInvLn2N * xd_0; + z_1 = vInvLn2N * xd_1; + + /* Round and convert z to int, the result is in [-150*N, 128*N] and + ideally ties-to-even rule is used, otherwise the magnitude of r + can be bigger which gives larger approximation error. */ + vkd_0 = vrndaq_f64 (z_0); + vkd_1 = vrndaq_f64 (z_1); + r_0 = z_0 - vkd_0; + r_1 = z_1 - vkd_1; + + ki_0 = (long) vkd_0[0]; + ki_1 = (long) vkd_0[1]; + ki_2 = (long) vkd_1[0]; + ki_3 = (long) vkd_1[1]; + + /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */ + t_0 = T[ki_0 % N]; + t_1 = T[ki_1 % N]; + t_2 = T[ki_2 % N]; + t_3 = T[ki_3 % N]; + t_0 += ki_0 << (52 - EXP2F_TABLE_BITS); + t_1 += ki_1 << (52 - EXP2F_TABLE_BITS); + t_2 += ki_2 << (52 - EXP2F_TABLE_BITS); + t_3 += ki_3 << (52 - EXP2F_TABLE_BITS); + s_0 = asdouble (t_0); + s_1 = asdouble (t_1); + s_2 = asdouble (t_2); + s_3 = asdouble (t_3); + + vs_0 = (float64x2_t) { s_0, s_1 }; + vs_1 = (float64x2_t) { s_2, s_3 }; + c0 = (float64x2_t) { C[0], C[0] }; + c1 = (float64x2_t) { C[1], C[1] }; + c2 = (float64x2_t) { C[2], C[2] }; + one = (float64x2_t) { 1.0, 1.0 }; + + z_0 = c0 * r_0 + c1; + z_1 = c0 * r_1 + c1; + r2_0 = r_0 * r_0; + r2_1 = r_1 * r_1; + y_0 = c2 * r_0 + one; + y_1 = c2 * r_1 + one; + y_0 = z_0 * r2_0 + y_0; + y_1 = z_1 * r2_1 + y_1; + y_0 = y_0 * vs_0; + y_1 = y_1 * vs_1; + result = pack_and_trunc (y_0, y_1); + return result; +} diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h index e69de29bb2..bd0463ce22 100644 --- a/sysdeps/aarch64/fpu/libmvec_util.h +++ b/sysdeps/aarch64/fpu/libmvec_util.h @@ -0,0 +1,54 @@ +/* Utility functions for Aarch64 vector functions. + Copyright (C) 2015-2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +/* Copy lower 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline float64x2_t +get_lo_and_extend (float32x4_t x) +{ + __Uint64x2_t tmp1 = (__Uint64x2_t) x; +#if __BYTE_ORDER == __BIG_ENDIAN + uint64_t tmp2 = (uint64_t) tmp1[1]; +#else + uint64_t tmp2 = (uint64_t) tmp1[0]; +#endif + return vcvt_f64_f32 ((float32x2_t) tmp2); +} + +/* Copy upper 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline float64x2_t +get_hi_and_extend (float32x4_t x) +{ + return vcvt_high_f64_f32 (x); +} + +/* Copy a pair of 2 element double vectors into a 4 element float vector. */ + +static __always_inline float32x4_t +pack_and_trunc (float64x2_t x, float64x2_t y) +{ + float32x2_t xx = vcvt_f32_f64 (x); + float32x2_t yy = vcvt_f32_f64 (y); + return (vcombine_f32 (xx, yy)); +} diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c index e69de29bb2..9eb31c8dfc 100644 --- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c @@ -0,0 +1,24 @@ +/* Wrapper part of tests for aarch64 double vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "test-double-vlen2.h" + +#define VEC_TYPE float64x2_t + +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp) diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c index e69de29bb2..7f64acf886 100644 --- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c @@ -0,0 +1,24 @@ +/* Wrapper part of tests for float aarch64 vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "test-float-vlen4.h" + +#define VEC_TYPE float32x4_t + +VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 585e5bbce7..1ed4af9e55 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1601,6 +1601,12 @@ float: 1 idouble: 1 ifloat: 1 +Function: "exp_vlen2": +double: 1 + +Function: "exp_vlen4": +float: 1 + Function: "expm1": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index e69de29bb2..9e178253f7 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -0,0 +1,2 @@ +GLIBC_2.30 _ZGVnN2v_exp F +GLIBC_2.30 _ZGVnN4v_expf F