From patchwork Wed Nov 24 19:37:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 48075 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D70503857C51 for ; Wed, 24 Nov 2021 19:45:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D70503857C51 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637783155; bh=OpxfNOByoRbnmn+O9rc3HrdXpf2bxMURPb8ksVchk4c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=splcy+F6YW6arkHZ+uJAuGvb6T5SVA+ehKbbZ3bOLpD9EFRuDm1uEaHzLViB3gn34 cWZWhos0Q6AUl9O8ECXSYyLE73xT60Z14xLYotcDPSlJNDBYIPsRmmN+dvgbQgjt/8 pmMW67ces/GeliO4WrsbP2wbOpukOdRtvJ/AMgcA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id D2C8D3858435 for ; Wed, 24 Nov 2021 19:38:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D2C8D3858435 X-IronPort-AV: E=McAfee;i="6200,9189,10178"; a="222581818" X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="222581818" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2021 11:38:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="510002823" Received: from scymds02.sc.intel.com ([10.82.73.244]) by orsmga008.jf.intel.com with ESMTP; 24 Nov 2021 11:38:09 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1AOJc7Wu021555; Wed, 24 Nov 2021 11:38:09 -0800 To: libc-alpha@sourceware.org Subject: [PATCH 09/42] x86-64: Add vector atan/atanf implementation to libmvec Date: Wed, 24 Nov 2021 11:37:34 -0800 Message-Id: <20211124193807.2093208-10-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211124193807.2093208-1-skpgkp2@gmail.com> References: <20211124193807.2093208-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, LOTS_OF_MONEY, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_atan2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_atan2_core.c | 27 + .../fpu/multiarch/svml_d_atan2_core_sse4.S | 1659 +++++++++++++++++ .../fpu/multiarch/svml_d_atan4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_atan4_core.c | 27 + .../fpu/multiarch/svml_d_atan4_core_avx2.S | 1521 +++++++++++++++ .../fpu/multiarch/svml_d_atan8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_atan8_core.c | 27 + .../fpu/multiarch/svml_d_atan8_core_avx512.S | 1323 +++++++++++++ .../fpu/multiarch/svml_s_atanf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_atanf16_core.c | 28 + .../multiarch/svml_s_atanf16_core_avx512.S | 352 ++++ .../fpu/multiarch/svml_s_atanf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_atanf4_core.c | 28 + .../fpu/multiarch/svml_s_atanf4_core_sse4.S | 884 +++++++++ .../fpu/multiarch/svml_s_atanf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_atanf8_core.c | 28 + .../fpu/multiarch/svml_s_atanf8_core_avx2.S | 688 +++++++ sysdeps/x86_64/fpu/svml_d_atan2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_atan4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_atan8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_atanf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_atanf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_atanf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-atan-avx.c | 1 + .../fpu/test-double-libmvec-atan-avx2.c | 1 + .../fpu/test-double-libmvec-atan-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-atan.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-atanf-avx.c | 1 + .../fpu/test-float-libmvec-atanf-avx2.c | 1 + .../fpu/test-float-libmvec-atanf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-atanf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 49 files changed, 6995 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_atan2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_atan8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index 02acfb1ef8..3e0aa043b4 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -142,4 +142,15 @@ #define __DECL_SIMD_asinhf32x #define __DECL_SIMD_asinhf64x #define __DECL_SIMD_asinhf128x + +#define __DECL_SIMD_atan +#define __DECL_SIMD_atanf +#define __DECL_SIMD_atanl +#define __DECL_SIMD_atanf16 +#define __DECL_SIMD_atanf32 +#define __DECL_SIMD_atanf64 +#define __DECL_SIMD_atanf128 +#define __DECL_SIMD_atanf32x +#define __DECL_SIMD_atanf64x +#define __DECL_SIMD_atanf128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 9a7cfb57c7..f37dbeebfb 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -54,7 +54,7 @@ __MATHCALL_VEC (acos,, (_Mdouble_ __x)); /* Arc sine of X. */ __MATHCALL_VEC (asin,, (_Mdouble_ __x)); /* Arc tangent of X. */ -__MATHCALL (atan,, (_Mdouble_ __x)); +__MATHCALL_VEC (atan,, (_Mdouble_ __x)); /* Arc tangent of Y/X. */ __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x)); diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index e246e1ba32..2ead94d87e 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -50,31 +50,39 @@ GLIBC_2.35 _ZGVbN2v_acos F GLIBC_2.35 _ZGVbN2v_acosh F GLIBC_2.35 _ZGVbN2v_asin F GLIBC_2.35 _ZGVbN2v_asinh F +GLIBC_2.35 _ZGVbN2v_atan F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F GLIBC_2.35 _ZGVbN4v_asinf F GLIBC_2.35 _ZGVbN4v_asinhf F +GLIBC_2.35 _ZGVbN4v_atanf F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F GLIBC_2.35 _ZGVcN4v_asin F GLIBC_2.35 _ZGVcN4v_asinh F +GLIBC_2.35 _ZGVcN4v_atan F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F GLIBC_2.35 _ZGVcN8v_asinf F GLIBC_2.35 _ZGVcN8v_asinhf F +GLIBC_2.35 _ZGVcN8v_atanf F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F GLIBC_2.35 _ZGVdN4v_asin F GLIBC_2.35 _ZGVdN4v_asinh F +GLIBC_2.35 _ZGVdN4v_atan F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F GLIBC_2.35 _ZGVdN8v_asinf F GLIBC_2.35 _ZGVdN8v_asinhf F +GLIBC_2.35 _ZGVdN8v_atanf F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F GLIBC_2.35 _ZGVeN16v_asinf F GLIBC_2.35 _ZGVeN16v_asinhf F +GLIBC_2.35 _ZGVeN16v_atanf F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F GLIBC_2.35 _ZGVeN8v_asin F GLIBC_2.35 _ZGVeN8v_asinh F +GLIBC_2.35 _ZGVeN8v_atan F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index f8f3d611b8..ef0a3fb7ed 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -74,6 +74,10 @@ # define __DECL_SIMD_asinh __DECL_SIMD_x86_64 # undef __DECL_SIMD_asinhf # define __DECL_SIMD_asinhf __DECL_SIMD_x86_64 +# undef __DECL_SIMD_atan +# define __DECL_SIMD_atan __DECL_SIMD_x86_64 +# undef __DECL_SIMD_atanf +# define __DECL_SIMD_atanf __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index 2bb482878b..c8418e68c8 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -26,6 +26,7 @@ libmvec-funcs = \ acosh \ asin \ asinh \ + atan \ cos \ exp \ log \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index 0f24286493..f7ce07574f 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -18,9 +18,11 @@ libmvec { _ZGVbN2v_acosh; _ZGVcN4v_acosh; _ZGVdN4v_acosh; _ZGVeN8v_acosh; _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin; _ZGVbN2v_asinh; _ZGVcN4v_asinh; _ZGVdN4v_asinh; _ZGVeN8v_asinh; + _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf; _ZGVbN4v_asinhf; _ZGVcN8v_asinhf; _ZGVdN8v_asinhf; _ZGVeN16v_asinhf; + _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index fb7086d2b9..de345e2bf1 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -221,6 +221,26 @@ float: 2 float128: 2 ldouble: 1 +Function: "atan_vlen16": +float: 1 + +Function: "atan_vlen2": +double: 1 + +Function: "atan_vlen4": +double: 1 +float: 1 + +Function: "atan_vlen4_avx2": +double: 1 + +Function: "atan_vlen8": +double: 1 +float: 1 + +Function: "atan_vlen8_avx2": +float: 1 + Function: "atanh": double: 2 float: 2 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S new file mode 100644 index 0000000000..115e5223aa --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized atan, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2v_atan _ZGVbN2v_atan_sse2 +#include "../svml_d_atan2_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c new file mode 100644 index 0000000000..93f079ffcb --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized atan, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2v_atan +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_atan, __GI__ZGVbN2v_atan, __redirect__ZGVbN2v_atan) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S new file mode 100644 index 0000000000..8be2997c0b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S @@ -0,0 +1,1659 @@ +/* Function atan vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVbN2v_atan_sse4) + lea 576+__svml_datan_data_internal_avx512(%rip), %rcx + movups __svml_datan_data_internal_avx512(%rip), %xmm1 + movups 64+__svml_datan_data_internal_avx512(%rip), %xmm4 + andps %xmm0, %xmm1 + movaps %xmm4, %xmm6 + movaps %xmm1, %xmm2 + addpd %xmm1, %xmm6 + movaps %xmm6, %xmm5 + pxor %xmm1, %xmm0 + subpd %xmm4, %xmm5 + subpd %xmm5, %xmm2 + mulpd %xmm1, %xmm5 + movups 128+__svml_datan_data_internal_avx512(%rip), %xmm7 + cmplepd %xmm1, %xmm7 + addpd 256+__svml_datan_data_internal_avx512(%rip), %xmm5 + +/* saturate X range */ + movups 320+__svml_datan_data_internal_avx512(%rip), %xmm3 + movaps %xmm7, %xmm4 + minpd %xmm1, %xmm3 + movups %xmm15, -24(%rsp) + cfi_offset(32, -32) + movaps %xmm7, %xmm15 + movups 192+__svml_datan_data_internal_avx512(%rip), %xmm1 + andnps %xmm2, %xmm4 + andps %xmm7, %xmm1 + andnps %xmm5, %xmm15 + andps %xmm7, %xmm3 + orps %xmm1, %xmm4 + orps %xmm3, %xmm15 + +/* + * table lookup sequence + * VPERMUTE not available + */ + movaps %xmm6, %xmm3 + +/* R+Rl = DiffX/Y */ + divpd %xmm15, %xmm4 + psllq $3, %xmm3 + pand .FLT_15(%rip), %xmm3 + movd %xmm3, %eax + pshufd $2, %xmm3, %xmm5 + movups 960+__svml_datan_data_internal_avx512(%rip), %xmm1 + movd %xmm5, %edx + cmplepd %xmm6, %xmm1 + movslq %eax, %rax + movaps %xmm1, %xmm6 + movslq %edx, %rdx + movsd -128(%rax,%rcx), %xmm2 + movhpd -128(%rdx,%rcx), %xmm2 + movsd (%rcx,%rax), %xmm3 + andnps %xmm2, %xmm6 + movhpd (%rcx,%rdx), %xmm3 + +/* set table value to Pi/2 for large X */ + movups 1024+__svml_datan_data_internal_avx512(%rip), %xmm2 + andps %xmm1, %xmm3 + movaps %xmm7, %xmm1 + andps %xmm7, %xmm2 + +/* polynomial evaluation */ + movaps %xmm4, %xmm7 + orps %xmm3, %xmm6 + mulpd %xmm4, %xmm7 + andnps %xmm6, %xmm1 + movaps %xmm7, %xmm5 + orps %xmm2, %xmm1 + movaps %xmm7, %xmm15 + movups 1152+__svml_datan_data_internal_avx512(%rip), %xmm2 + mulpd %xmm7, %xmm2 + mulpd %xmm7, %xmm5 + addpd 1168+__svml_datan_data_internal_avx512(%rip), %xmm2 + mulpd %xmm4, %xmm15 + mulpd %xmm5, %xmm2 + movups 1184+__svml_datan_data_internal_avx512(%rip), %xmm3 + mulpd %xmm7, %xmm3 + addpd 1200+__svml_datan_data_internal_avx512(%rip), %xmm3 + movups 1216+__svml_datan_data_internal_avx512(%rip), %xmm6 + mulpd %xmm7, %xmm6 + addpd %xmm2, %xmm3 + addpd 1232+__svml_datan_data_internal_avx512(%rip), %xmm6 + mulpd %xmm3, %xmm5 + addpd %xmm5, %xmm6 + mulpd %xmm6, %xmm15 + addpd %xmm15, %xmm4 + addpd %xmm4, %xmm1 + movups -24(%rsp), %xmm15 + cfi_restore(32) + pxor %xmm1, %xmm0 + ret + +END(_ZGVbN2v_atan_sse4) + + .align 16,0x90 + +__svml_datan_cout_rare_internal: + + cfi_startproc + + movzwl 6(%rdi), %r8d + andl $32752, %r8d + shrl $4, %r8d + cmpl $2047, %r8d + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -16(%rsp) + shrq $56, %rdx + movb 7(%rdi), %al + andl $127, %edx + movb %dl, -9(%rsp) + movsd -16(%rsp), %xmm0 + shrb $7, %al + comisd 1888+__datan_la_CoutTab(%rip), %xmm0 + movl -12(%rsp), %ecx + jb .LBL_2_6 + movsd 1896+__datan_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_5 + movl 4(%rdi), %edx + movl %ecx, %edi + andl $-524288, %ecx + andl $-1048576, %edi + addl $262144, %ecx + movaps %xmm0, %xmm9 + andl $1048575, %ecx + movaps %xmm0, %xmm10 + movsd %xmm0, -56(%rsp) + orl %ecx, %edi + movl $0, -56(%rsp) + andl $1048575, %edx + movl %edi, -52(%rsp) + lea __datan_la_CoutTab(%rip), %rcx + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd -56(%rsp), %xmm15 + shll $20, %r8d + subsd -56(%rsp), %xmm9 + mulsd 1928+__datan_la_CoutTab(%rip), %xmm10 + shlb $7, %al + mulsd %xmm9, %xmm4 + movsd %xmm4, -48(%rsp) + orl %edx, %r8d + movsd -48(%rsp), %xmm5 + addl $-1069547520, %r8d + sarl $18, %r8d + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + andl $-2, %r8d + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + movslq %r8d, %r8 + subsd %xmm6, %xmm7 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm2 + movsd -40(%rsp), %xmm3 + movsd %xmm10, -48(%rsp) + movsd -48(%rsp), %xmm11 + movsd 1904+__datan_la_CoutTab(%rip), %xmm8 + subsd -16(%rsp), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -48(%rsp), %xmm13 + movsd -40(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -48(%rsp) + movsd -48(%rsp), %xmm14 + subsd %xmm14, %xmm0 + movsd 1904+__datan_la_CoutTab(%rip), %xmm14 + movsd %xmm0, -40(%rsp) + movsd -48(%rsp), %xmm5 + movsd -40(%rsp), %xmm4 + mulsd %xmm15, %xmm5 + mulsd %xmm15, %xmm4 + movaps %xmm5, %xmm1 + addsd %xmm4, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + subsd %xmm0, %xmm5 + addsd %xmm4, %xmm5 + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm11 + movsd -40(%rsp), %xmm1 + addsd %xmm11, %xmm6 + movsd %xmm6, -48(%rsp) + movsd -48(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -40(%rsp) + movsd -48(%rsp), %xmm10 + movsd -40(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm12 + movsd 1928+__datan_la_CoutTab(%rip), %xmm10 + addsd %xmm11, %xmm12 + movsd %xmm12, -40(%rsp) + movsd -32(%rsp), %xmm13 + movsd 1904+__datan_la_CoutTab(%rip), %xmm11 + subsd %xmm13, %xmm14 + movsd %xmm14, -32(%rsp) + movsd -40(%rsp), %xmm0 + movsd -32(%rsp), %xmm15 + addsd %xmm15, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -48(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -32(%rsp), %xmm0 + movsd %xmm4, -48(%rsp) + addsd %xmm1, %xmm0 + movsd -48(%rsp), %xmm5 + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm4 + divsd %xmm4, %xmm11 + mulsd %xmm11, %xmm10 + movsd -40(%rsp), %xmm5 + movsd %xmm10, -40(%rsp) + addsd %xmm0, %xmm5 + movsd -40(%rsp), %xmm12 + subsd %xmm11, %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm10 + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -32(%rsp), %xmm14 + mulsd %xmm14, %xmm4 + movsd -32(%rsp), %xmm15 + subsd %xmm4, %xmm6 + mulsd %xmm15, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -40(%rsp), %xmm1 + subsd %xmm1, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -40(%rsp), %xmm4 + movsd -32(%rsp), %xmm5 + movsd -40(%rsp), %xmm0 + movaps %xmm5, %xmm7 + movsd -32(%rsp), %xmm1 + mulsd %xmm3, %xmm5 + addsd 1904+__datan_la_CoutTab(%rip), %xmm4 + mulsd %xmm2, %xmm7 + mulsd %xmm0, %xmm4 + mulsd %xmm1, %xmm4 + mulsd %xmm4, %xmm3 + mulsd %xmm4, %xmm2 + addsd %xmm3, %xmm5 + movsd 1872+__datan_la_CoutTab(%rip), %xmm6 + addsd %xmm2, %xmm5 + movsd %xmm5, -48(%rsp) + movaps %xmm7, %xmm2 + movsd -48(%rsp), %xmm4 + addsd %xmm4, %xmm2 + movsd %xmm2, -48(%rsp) + movsd -48(%rsp), %xmm3 + movsd (%rcx,%r8,8), %xmm2 + subsd %xmm3, %xmm7 + addsd %xmm4, %xmm7 + movsd %xmm7, -40(%rsp) + movsd -48(%rsp), %xmm3 + movaps %xmm3, %xmm5 + movaps %xmm3, %xmm0 + mulsd %xmm3, %xmm5 + addsd %xmm2, %xmm0 + mulsd %xmm5, %xmm6 + movsd -40(%rsp), %xmm10 + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm1 + addsd 1864+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm1, %xmm2 + mulsd %xmm5, %xmm6 + addsd %xmm3, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + movsd %xmm2, -40(%rsp) + movsd -48(%rsp), %xmm9 + movsd -40(%rsp), %xmm8 + addsd 1848+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1840+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1832+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1824+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + mulsd %xmm3, %xmm6 + addsd %xmm6, %xmm10 + addsd 8(%rcx,%r8,8), %xmm10 + addsd %xmm8, %xmm10 + addsd %xmm9, %xmm10 + movsd %xmm10, -24(%rsp) + movb -17(%rsp), %r9b + andb $127, %r9b + orb %al, %r9b + movb %r9b, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_5: + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + shlb $7, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_6: + comisd 1880+__datan_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movaps %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + shlb $7, %al + movsd 1872+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1864+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1848+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1840+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1832+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1824+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm2 + addsd %xmm0, %xmm2 + movsd %xmm2, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_8: + movzwl -10(%rsp), %edx + testl $32752, %edx + je .LBL_2_10 + movsd 1904+__datan_la_CoutTab(%rip), %xmm1 + shlb $7, %al + addsd %xmm0, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + mulsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_10: + mulsd %xmm0, %xmm0 + shlb $7, %al + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm0 + addsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + +.LBL_2_11: + xorl %eax, %eax + ret + +.LBL_2_12: + testl $1048575, 4(%rdi) + jne .LBL_2_15 + cmpl $0, (%rdi) + jne .LBL_2_15 + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + movb 7(%rdi), %al + andb $-128, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rcx + movq %rcx, (%rsi) + jmp .LBL_2_11 + +.LBL_2_15: + movsd (%rdi), %xmm0 + addsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_11 + + cfi_endproc + + .type __svml_datan_cout_rare_internal,@function + .size __svml_datan_cout_rare_internal,.-__svml_datan_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_datan_data_internal_avx512: + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 0 + .long 0 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 0 + .long 4180443357 + .long 1070553973 + .long 90291023 + .long 1071492199 + .long 2737217249 + .long 1071945615 + .long 1413754136 + .long 1072243195 + .long 1468297118 + .long 1072475260 + .long 3531732635 + .long 1072657163 + .long 744202399 + .long 1072747407 + .long 2464923204 + .long 1072805601 + .long 1436891685 + .long 1072853231 + .long 2037009832 + .long 1072892781 + .long 1826698067 + .long 1072926058 + .long 1803191648 + .long 1072954391 + .long 2205372832 + .long 1072978772 + .long 4234512805 + .long 1072999952 + .long 3932628503 + .long 1073018509 + .long 2501811453 + .long 1073034892 + .long 866379431 + .long 1073049455 + .long 1376865888 + .long 1073062480 + .long 3290094269 + .long 1073074195 + .long 354764887 + .long 1073084787 + .long 3332975497 + .long 1073094406 + .long 1141460092 + .long 1073103181 + .long 745761286 + .long 1073111216 + .long 1673304509 + .long 1073118600 + .long 983388243 + .long 1073125409 + .long 3895509104 + .long 1073131706 + .long 2128523669 + .long 1073137548 + .long 2075485693 + .long 1073142981 + .long 121855980 + .long 1073148047 + .long 4181733783 + .long 1073152780 + .long 2887813284 + .long 1073157214 + .long 0 + .long 0 + .long 1022865341 + .long 1013492590 + .long 573531618 + .long 1014639487 + .long 2280825944 + .long 1014120858 + .long 856972295 + .long 1015129638 + .long 986810987 + .long 1015077601 + .long 2062601149 + .long 1013974920 + .long 589036912 + .long 3164328156 + .long 1787331214 + .long 1016798022 + .long 2942272763 + .long 3164235441 + .long 2956702105 + .long 1016472908 + .long 3903328092 + .long 3162582135 + .long 3175026820 + .long 3158589859 + .long 787328196 + .long 1014621351 + .long 2317874517 + .long 3163795518 + .long 4071621134 + .long 1016673529 + .long 2492111345 + .long 3164172103 + .long 3606178875 + .long 3162371821 + .long 3365790232 + .long 1014547152 + .long 2710887773 + .long 1017086651 + .long 2755350986 + .long 3162706257 + .long 198095269 + .long 3162802133 + .long 2791076759 + .long 3164364640 + .long 4214434319 + .long 3162164074 + .long 773754012 + .long 3164190653 + .long 139561443 + .long 3164313657 + .long 2197796619 + .long 3164066219 + .long 3592486882 + .long 1016669082 + .long 1148791015 + .long 3163724934 + .long 386789398 + .long 3163117479 + .long 2518816264 + .long 3162291736 + .long 2545101323 + .long 3164592727 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_datan_data_internal_avx512,@object + .size __svml_datan_data_internal_avx512,1280 + .align 32 + +__datan_la_CoutTab: + .long 3892314112 + .long 1069799150 + .long 2332892550 + .long 1039715405 + .long 1342177280 + .long 1070305495 + .long 270726690 + .long 1041535749 + .long 939524096 + .long 1070817911 + .long 2253973841 + .long 3188654726 + .long 3221225472 + .long 1071277294 + .long 3853927037 + .long 1043226911 + .long 2818572288 + .long 1071767563 + .long 2677759107 + .long 1044314101 + .long 3355443200 + .long 1072103591 + .long 1636578514 + .long 3191094734 + .long 1476395008 + .long 1072475260 + .long 1864703685 + .long 3188646936 + .long 805306368 + .long 1072747407 + .long 192551812 + .long 3192726267 + .long 2013265920 + .long 1072892781 + .long 2240369452 + .long 1043768538 + .long 0 + .long 1072999953 + .long 3665168337 + .long 3192705970 + .long 402653184 + .long 1073084787 + .long 1227953434 + .long 3192313277 + .long 2013265920 + .long 1073142981 + .long 3853283127 + .long 1045277487 + .long 805306368 + .long 1073187261 + .long 1676192264 + .long 3192868861 + .long 134217728 + .long 1073217000 + .long 4290763938 + .long 1042034855 + .long 671088640 + .long 1073239386 + .long 994303084 + .long 3189643768 + .long 402653184 + .long 1073254338 + .long 1878067156 + .long 1042652475 + .long 1610612736 + .long 1073265562 + .long 670314820 + .long 1045138554 + .long 3221225472 + .long 1073273048 + .long 691126919 + .long 3189987794 + .long 3489660928 + .long 1073278664 + .long 1618990832 + .long 3188194509 + .long 1207959552 + .long 1073282409 + .long 2198872939 + .long 1044806069 + .long 3489660928 + .long 1073285217 + .long 2633982383 + .long 1042307894 + .long 939524096 + .long 1073287090 + .long 1059367786 + .long 3189114230 + .long 2281701376 + .long 1073288494 + .long 3158525533 + .long 1044484961 + .long 3221225472 + .long 1073289430 + .long 286581777 + .long 1044893263 + .long 4026531840 + .long 1073290132 + .long 2000245215 + .long 3191647611 + .long 134217728 + .long 1073290601 + .long 4205071590 + .long 1045035927 + .long 536870912 + .long 1073290952 + .long 2334392229 + .long 1043447393 + .long 805306368 + .long 1073291186 + .long 2281458177 + .long 3188885569 + .long 3087007744 + .long 1073291361 + .long 691611507 + .long 1044733832 + .long 3221225472 + .long 1073291478 + .long 1816229550 + .long 1044363390 + .long 2281701376 + .long 1073291566 + .long 1993843750 + .long 3189837440 + .long 134217728 + .long 1073291625 + .long 3654754496 + .long 1044970837 + .long 4026531840 + .long 1073291668 + .long 3224300229 + .long 3191935390 + .long 805306368 + .long 1073291698 + .long 2988777976 + .long 3188950659 + .long 536870912 + .long 1073291720 + .long 1030371341 + .long 1043402665 + .long 3221225472 + .long 1073291734 + .long 1524463765 + .long 1044361356 + .long 3087007744 + .long 1073291745 + .long 2754295320 + .long 1044731036 + .long 134217728 + .long 1073291753 + .long 3099629057 + .long 1044970710 + .long 2281701376 + .long 1073291758 + .long 962914160 + .long 3189838838 + .long 805306368 + .long 1073291762 + .long 3543908206 + .long 3188950786 + .long 4026531840 + .long 1073291764 + .long 1849909620 + .long 3191935434 + .long 3221225472 + .long 1073291766 + .long 1641333636 + .long 1044361352 + .long 536870912 + .long 1073291768 + .long 1373968792 + .long 1043402654 + .long 134217728 + .long 1073291769 + .long 2033191599 + .long 1044970710 + .long 3087007744 + .long 1073291769 + .long 4117947437 + .long 1044731035 + .long 805306368 + .long 1073291770 + .long 315378368 + .long 3188950787 + .long 2281701376 + .long 1073291770 + .long 2428571750 + .long 3189838838 + .long 3221225472 + .long 1073291770 + .long 1608007466 + .long 1044361352 + .long 4026531840 + .long 1073291770 + .long 1895711420 + .long 3191935434 + .long 134217728 + .long 1073291771 + .long 2031108713 + .long 1044970710 + .long 536870912 + .long 1073291771 + .long 1362518342 + .long 1043402654 + .long 805306368 + .long 1073291771 + .long 317461253 + .long 3188950787 + .long 939524096 + .long 1073291771 + .long 4117231784 + .long 1044731035 + .long 1073741824 + .long 1073291771 + .long 1607942376 + .long 1044361352 + .long 1207959552 + .long 1073291771 + .long 2428929577 + .long 3189838838 + .long 1207959552 + .long 1073291771 + .long 2031104645 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1895722602 + .long 3191935434 + .long 1342177280 + .long 1073291771 + .long 317465322 + .long 3188950787 + .long 1342177280 + .long 1073291771 + .long 1362515546 + .long 1043402654 + .long 1342177280 + .long 1073291771 + .long 1607942248 + .long 1044361352 + .long 1342177280 + .long 1073291771 + .long 4117231610 + .long 1044731035 + .long 1342177280 + .long 1073291771 + .long 2031104637 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1540251232 + .long 1045150466 + .long 1342177280 + .long 1073291771 + .long 2644671394 + .long 1045270303 + .long 1342177280 + .long 1073291771 + .long 2399244691 + .long 1045360181 + .long 1342177280 + .long 1073291771 + .long 803971124 + .long 1045420100 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192879152 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192849193 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192826724 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192811744 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192800509 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192793019 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192787402 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192783657 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192780848 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192778976 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192777572 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192776635 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192775933 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192775465 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192775114 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774880 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774704 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774587 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774500 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774441 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774397 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774368 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774346 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774331 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774320 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774313 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774308 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774304 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774301 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774299 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774298 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774297 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1466225875 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1343512524 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1251477510 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1190120835 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1144103328 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1113424990 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1090416237 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1075077068 + .long 3192774295 + .long 1431655765 + .long 3218429269 + .long 2576978363 + .long 1070176665 + .long 2453154343 + .long 3217180964 + .long 4189149139 + .long 1069314502 + .long 1775019125 + .long 3216459198 + .long 273199057 + .long 1068739452 + .long 874748308 + .long 3215993277 + .long 0 + .long 1017118720 + .long 0 + .long 1069547520 + .long 0 + .long 1129316352 + .long 0 + .long 1072693248 + .long 1413754136 + .long 1073291771 + .long 856972295 + .long 1016178214 + .long 33554432 + .long 1101004800 + .type __datan_la_CoutTab,@object + .size __datan_la_CoutTab,1936 + .align 16 + +.FLT_15: + .long 0x00000078,0x00000000,0x00000078,0x00000000 + .type .FLT_15,@object + .size .FLT_15,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S new file mode 100644 index 0000000000..79c48dbc91 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized atan, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4v_atan _ZGVdN4v_atan_sse_wrapper +#include "../svml_d_atan4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c new file mode 100644 index 0000000000..64ce66b9fd --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized atan, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4v_atan +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_atan, __GI__ZGVdN4v_atan, __redirect__ZGVdN4v_atan) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S new file mode 100644 index 0000000000..ac2797e80a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S @@ -0,0 +1,1521 @@ +/* Function atan vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVdN4v_atan_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + lea 576+__svml_datan_data_internal_avx512(%rip), %rdx + vmovupd 64+__svml_datan_data_internal_avx512(%rip), %ymm7 + vmovupd 256+__svml_datan_data_internal_avx512(%rip), %ymm2 + vmovups %ymm15, -64(%rsp) + vandpd __svml_datan_data_internal_avx512(%rip), %ymm0, %ymm3 + vaddpd %ymm7, %ymm3, %ymm6 + vcmpge_oqpd 128+__svml_datan_data_internal_avx512(%rip), %ymm3, %ymm5 + vsubpd %ymm7, %ymm6, %ymm1 + vxorpd %ymm0, %ymm3, %ymm4 + +/* saturate X range */ + vmovupd 320+__svml_datan_data_internal_avx512(%rip), %ymm0 + vfmadd231pd %ymm3, %ymm1, %ymm2 + vsubpd %ymm1, %ymm3, %ymm7 + vminpd %ymm3, %ymm0, %ymm3 + +/* + * table lookup sequence + * VPERMUTE not available + */ + vpsllq $3, %ymm6, %ymm0 + vcmpge_oqpd 960+__svml_datan_data_internal_avx512(%rip), %ymm6, %ymm6 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vblendvpd %ymm5, 192+__svml_datan_data_internal_avx512(%rip), %ymm7, %ymm15 + vblendvpd %ymm5, %ymm3, %ymm2, %ymm2 + +/* R+Rl = DiffX/Y */ + vdivpd %ymm2, %ymm15, %ymm1 + vpand .FLT_15(%rip), %ymm0, %ymm2 + vextractf128 $1, %ymm2, %xmm15 + vmovd %xmm2, %eax + vmovd %xmm15, %r8d + movslq %eax, %rax + vpextrd $2, %xmm2, %ecx + movslq %r8d, %r8 + vpextrd $2, %xmm15, %r9d + movslq %ecx, %rcx + movslq %r9d, %r9 + vmovsd -128(%rax,%rdx), %xmm3 + vmovsd -128(%r8,%rdx), %xmm0 + vmovhpd -128(%rcx,%rdx), %xmm3, %xmm7 + vmovhpd -128(%r9,%rdx), %xmm0, %xmm2 + vmovsd (%rdx,%rax), %xmm3 + vmovhpd (%rdx,%rcx), %xmm3, %xmm15 + vinsertf128 $1, %xmm2, %ymm7, %ymm0 + vmovsd (%rdx,%r8), %xmm2 + vmovhpd (%rdx,%r9), %xmm2, %xmm3 + vinsertf128 $1, %xmm3, %ymm15, %ymm7 + vblendvpd %ymm6, %ymm7, %ymm0, %ymm0 + +/* polynomial evaluation */ + vmulpd %ymm1, %ymm1, %ymm3 + vmovupd 1152+__svml_datan_data_internal_avx512(%rip), %ymm15 + +/* set table value to Pi/2 for large X */ + vblendvpd %ymm5, 1024+__svml_datan_data_internal_avx512(%rip), %ymm0, %ymm7 + vmovupd 1216+__svml_datan_data_internal_avx512(%rip), %ymm0 + vfmadd213pd 1184+__svml_datan_data_internal_avx512(%rip), %ymm3, %ymm15 + vmulpd %ymm3, %ymm3, %ymm2 + vmovupd 1280+__svml_datan_data_internal_avx512(%rip), %ymm5 + vmulpd %ymm3, %ymm1, %ymm6 + vfmadd213pd 1248+__svml_datan_data_internal_avx512(%rip), %ymm3, %ymm0 + vfmadd213pd 1312+__svml_datan_data_internal_avx512(%rip), %ymm5, %ymm3 + vfmadd213pd %ymm0, %ymm2, %ymm15 + vfmadd213pd %ymm3, %ymm2, %ymm15 + vfmadd213pd %ymm1, %ymm6, %ymm15 + vaddpd %ymm15, %ymm7, %ymm1 + vmovups -64(%rsp), %ymm15 + cfi_restore(98) + vxorpd %ymm4, %ymm1, %ymm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + +END(_ZGVdN4v_atan_avx2) + + .align 16,0x90 + +__svml_datan_cout_rare_internal: + + cfi_startproc + + movzwl 6(%rdi), %r8d + andl $32752, %r8d + shrl $4, %r8d + cmpl $2047, %r8d + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -16(%rsp) + shrq $56, %rdx + movb 7(%rdi), %al + andl $127, %edx + movb %dl, -9(%rsp) + movsd -16(%rsp), %xmm0 + shrb $7, %al + comisd 1888+__datan_la_CoutTab(%rip), %xmm0 + movl -12(%rsp), %ecx + jb .LBL_2_6 + movsd 1896+__datan_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_5 + movl 4(%rdi), %edx + movl %ecx, %edi + andl $-524288, %ecx + andl $-1048576, %edi + addl $262144, %ecx + movaps %xmm0, %xmm9 + andl $1048575, %ecx + movaps %xmm0, %xmm10 + movsd %xmm0, -56(%rsp) + orl %ecx, %edi + movl $0, -56(%rsp) + andl $1048575, %edx + movl %edi, -52(%rsp) + lea __datan_la_CoutTab(%rip), %rcx + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd -56(%rsp), %xmm15 + shll $20, %r8d + subsd -56(%rsp), %xmm9 + mulsd 1928+__datan_la_CoutTab(%rip), %xmm10 + shlb $7, %al + mulsd %xmm9, %xmm4 + movsd %xmm4, -48(%rsp) + orl %edx, %r8d + movsd -48(%rsp), %xmm5 + addl $-1069547520, %r8d + sarl $18, %r8d + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + andl $-2, %r8d + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + movslq %r8d, %r8 + subsd %xmm6, %xmm7 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm2 + movsd -40(%rsp), %xmm3 + movsd %xmm10, -48(%rsp) + movsd -48(%rsp), %xmm11 + movsd 1904+__datan_la_CoutTab(%rip), %xmm8 + subsd -16(%rsp), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -48(%rsp), %xmm13 + movsd -40(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -48(%rsp) + movsd -48(%rsp), %xmm14 + subsd %xmm14, %xmm0 + movsd 1904+__datan_la_CoutTab(%rip), %xmm14 + movsd %xmm0, -40(%rsp) + movsd -48(%rsp), %xmm5 + movsd -40(%rsp), %xmm4 + mulsd %xmm15, %xmm5 + mulsd %xmm15, %xmm4 + movaps %xmm5, %xmm1 + addsd %xmm4, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + subsd %xmm0, %xmm5 + addsd %xmm4, %xmm5 + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm11 + movsd -40(%rsp), %xmm1 + addsd %xmm11, %xmm6 + movsd %xmm6, -48(%rsp) + movsd -48(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -40(%rsp) + movsd -48(%rsp), %xmm10 + movsd -40(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm12 + movsd 1928+__datan_la_CoutTab(%rip), %xmm10 + addsd %xmm11, %xmm12 + movsd %xmm12, -40(%rsp) + movsd -32(%rsp), %xmm13 + movsd 1904+__datan_la_CoutTab(%rip), %xmm11 + subsd %xmm13, %xmm14 + movsd %xmm14, -32(%rsp) + movsd -40(%rsp), %xmm0 + movsd -32(%rsp), %xmm15 + addsd %xmm15, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -48(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -32(%rsp), %xmm0 + movsd %xmm4, -48(%rsp) + addsd %xmm1, %xmm0 + movsd -48(%rsp), %xmm5 + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm4 + divsd %xmm4, %xmm11 + mulsd %xmm11, %xmm10 + movsd -40(%rsp), %xmm5 + movsd %xmm10, -40(%rsp) + addsd %xmm0, %xmm5 + movsd -40(%rsp), %xmm12 + subsd %xmm11, %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm10 + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -32(%rsp), %xmm14 + mulsd %xmm14, %xmm4 + movsd -32(%rsp), %xmm15 + subsd %xmm4, %xmm6 + mulsd %xmm15, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -40(%rsp), %xmm1 + subsd %xmm1, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -40(%rsp), %xmm4 + movsd -32(%rsp), %xmm5 + movsd -40(%rsp), %xmm0 + movaps %xmm5, %xmm7 + movsd -32(%rsp), %xmm1 + mulsd %xmm3, %xmm5 + addsd 1904+__datan_la_CoutTab(%rip), %xmm4 + mulsd %xmm2, %xmm7 + mulsd %xmm0, %xmm4 + mulsd %xmm1, %xmm4 + mulsd %xmm4, %xmm3 + mulsd %xmm4, %xmm2 + addsd %xmm3, %xmm5 + movsd 1872+__datan_la_CoutTab(%rip), %xmm6 + addsd %xmm2, %xmm5 + movsd %xmm5, -48(%rsp) + movaps %xmm7, %xmm2 + movsd -48(%rsp), %xmm4 + addsd %xmm4, %xmm2 + movsd %xmm2, -48(%rsp) + movsd -48(%rsp), %xmm3 + movsd (%rcx,%r8,8), %xmm2 + subsd %xmm3, %xmm7 + addsd %xmm4, %xmm7 + movsd %xmm7, -40(%rsp) + movsd -48(%rsp), %xmm3 + movaps %xmm3, %xmm5 + movaps %xmm3, %xmm0 + mulsd %xmm3, %xmm5 + addsd %xmm2, %xmm0 + mulsd %xmm5, %xmm6 + movsd -40(%rsp), %xmm10 + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm1 + addsd 1864+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm1, %xmm2 + mulsd %xmm5, %xmm6 + addsd %xmm3, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + movsd %xmm2, -40(%rsp) + movsd -48(%rsp), %xmm9 + movsd -40(%rsp), %xmm8 + addsd 1848+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1840+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1832+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1824+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + mulsd %xmm3, %xmm6 + addsd %xmm6, %xmm10 + addsd 8(%rcx,%r8,8), %xmm10 + addsd %xmm8, %xmm10 + addsd %xmm9, %xmm10 + movsd %xmm10, -24(%rsp) + movb -17(%rsp), %r9b + andb $127, %r9b + orb %al, %r9b + movb %r9b, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_5: + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + shlb $7, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_6: + comisd 1880+__datan_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movaps %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + shlb $7, %al + movsd 1872+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1864+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1848+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1840+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1832+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1824+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm2 + addsd %xmm0, %xmm2 + movsd %xmm2, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_8: + movzwl -10(%rsp), %edx + testl $32752, %edx + je .LBL_2_10 + movsd 1904+__datan_la_CoutTab(%rip), %xmm1 + shlb $7, %al + addsd %xmm0, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + mulsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_10: + mulsd %xmm0, %xmm0 + shlb $7, %al + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm0 + addsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + +.LBL_2_11: + xorl %eax, %eax + ret + +.LBL_2_12: + testl $1048575, 4(%rdi) + jne .LBL_2_15 + cmpl $0, (%rdi) + jne .LBL_2_15 + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + movb 7(%rdi), %al + andb $-128, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rcx + movq %rcx, (%rsi) + jmp .LBL_2_11 + +.LBL_2_15: + movsd (%rdi), %xmm0 + addsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_11 + + cfi_endproc + + .type __svml_datan_cout_rare_internal,@function + .size __svml_datan_cout_rare_internal,.-__svml_datan_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_datan_data_internal_avx512: + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 0 + .long 4180443357 + .long 1070553973 + .long 90291023 + .long 1071492199 + .long 2737217249 + .long 1071945615 + .long 1413754136 + .long 1072243195 + .long 1468297118 + .long 1072475260 + .long 3531732635 + .long 1072657163 + .long 744202399 + .long 1072747407 + .long 2464923204 + .long 1072805601 + .long 1436891685 + .long 1072853231 + .long 2037009832 + .long 1072892781 + .long 1826698067 + .long 1072926058 + .long 1803191648 + .long 1072954391 + .long 2205372832 + .long 1072978772 + .long 4234512805 + .long 1072999952 + .long 3932628503 + .long 1073018509 + .long 2501811453 + .long 1073034892 + .long 866379431 + .long 1073049455 + .long 1376865888 + .long 1073062480 + .long 3290094269 + .long 1073074195 + .long 354764887 + .long 1073084787 + .long 3332975497 + .long 1073094406 + .long 1141460092 + .long 1073103181 + .long 745761286 + .long 1073111216 + .long 1673304509 + .long 1073118600 + .long 983388243 + .long 1073125409 + .long 3895509104 + .long 1073131706 + .long 2128523669 + .long 1073137548 + .long 2075485693 + .long 1073142981 + .long 121855980 + .long 1073148047 + .long 4181733783 + .long 1073152780 + .long 2887813284 + .long 1073157214 + .long 0 + .long 0 + .long 1022865341 + .long 1013492590 + .long 573531618 + .long 1014639487 + .long 2280825944 + .long 1014120858 + .long 856972295 + .long 1015129638 + .long 986810987 + .long 1015077601 + .long 2062601149 + .long 1013974920 + .long 589036912 + .long 3164328156 + .long 1787331214 + .long 1016798022 + .long 2942272763 + .long 3164235441 + .long 2956702105 + .long 1016472908 + .long 3903328092 + .long 3162582135 + .long 3175026820 + .long 3158589859 + .long 787328196 + .long 1014621351 + .long 2317874517 + .long 3163795518 + .long 4071621134 + .long 1016673529 + .long 2492111345 + .long 3164172103 + .long 3606178875 + .long 3162371821 + .long 3365790232 + .long 1014547152 + .long 2710887773 + .long 1017086651 + .long 2755350986 + .long 3162706257 + .long 198095269 + .long 3162802133 + .long 2791076759 + .long 3164364640 + .long 4214434319 + .long 3162164074 + .long 773754012 + .long 3164190653 + .long 139561443 + .long 3164313657 + .long 2197796619 + .long 3164066219 + .long 3592486882 + .long 1016669082 + .long 1148791015 + .long 3163724934 + .long 386789398 + .long 3163117479 + .long 2518816264 + .long 3162291736 + .long 2545101323 + .long 3164592727 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .type __svml_datan_data_internal_avx512,@object + .size __svml_datan_data_internal_avx512,1344 + .align 32 + +.FLT_15: + .long 0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000 + .type .FLT_15,@object + .size .FLT_15,32 + .align 32 + +__datan_la_CoutTab: + .long 3892314112 + .long 1069799150 + .long 2332892550 + .long 1039715405 + .long 1342177280 + .long 1070305495 + .long 270726690 + .long 1041535749 + .long 939524096 + .long 1070817911 + .long 2253973841 + .long 3188654726 + .long 3221225472 + .long 1071277294 + .long 3853927037 + .long 1043226911 + .long 2818572288 + .long 1071767563 + .long 2677759107 + .long 1044314101 + .long 3355443200 + .long 1072103591 + .long 1636578514 + .long 3191094734 + .long 1476395008 + .long 1072475260 + .long 1864703685 + .long 3188646936 + .long 805306368 + .long 1072747407 + .long 192551812 + .long 3192726267 + .long 2013265920 + .long 1072892781 + .long 2240369452 + .long 1043768538 + .long 0 + .long 1072999953 + .long 3665168337 + .long 3192705970 + .long 402653184 + .long 1073084787 + .long 1227953434 + .long 3192313277 + .long 2013265920 + .long 1073142981 + .long 3853283127 + .long 1045277487 + .long 805306368 + .long 1073187261 + .long 1676192264 + .long 3192868861 + .long 134217728 + .long 1073217000 + .long 4290763938 + .long 1042034855 + .long 671088640 + .long 1073239386 + .long 994303084 + .long 3189643768 + .long 402653184 + .long 1073254338 + .long 1878067156 + .long 1042652475 + .long 1610612736 + .long 1073265562 + .long 670314820 + .long 1045138554 + .long 3221225472 + .long 1073273048 + .long 691126919 + .long 3189987794 + .long 3489660928 + .long 1073278664 + .long 1618990832 + .long 3188194509 + .long 1207959552 + .long 1073282409 + .long 2198872939 + .long 1044806069 + .long 3489660928 + .long 1073285217 + .long 2633982383 + .long 1042307894 + .long 939524096 + .long 1073287090 + .long 1059367786 + .long 3189114230 + .long 2281701376 + .long 1073288494 + .long 3158525533 + .long 1044484961 + .long 3221225472 + .long 1073289430 + .long 286581777 + .long 1044893263 + .long 4026531840 + .long 1073290132 + .long 2000245215 + .long 3191647611 + .long 134217728 + .long 1073290601 + .long 4205071590 + .long 1045035927 + .long 536870912 + .long 1073290952 + .long 2334392229 + .long 1043447393 + .long 805306368 + .long 1073291186 + .long 2281458177 + .long 3188885569 + .long 3087007744 + .long 1073291361 + .long 691611507 + .long 1044733832 + .long 3221225472 + .long 1073291478 + .long 1816229550 + .long 1044363390 + .long 2281701376 + .long 1073291566 + .long 1993843750 + .long 3189837440 + .long 134217728 + .long 1073291625 + .long 3654754496 + .long 1044970837 + .long 4026531840 + .long 1073291668 + .long 3224300229 + .long 3191935390 + .long 805306368 + .long 1073291698 + .long 2988777976 + .long 3188950659 + .long 536870912 + .long 1073291720 + .long 1030371341 + .long 1043402665 + .long 3221225472 + .long 1073291734 + .long 1524463765 + .long 1044361356 + .long 3087007744 + .long 1073291745 + .long 2754295320 + .long 1044731036 + .long 134217728 + .long 1073291753 + .long 3099629057 + .long 1044970710 + .long 2281701376 + .long 1073291758 + .long 962914160 + .long 3189838838 + .long 805306368 + .long 1073291762 + .long 3543908206 + .long 3188950786 + .long 4026531840 + .long 1073291764 + .long 1849909620 + .long 3191935434 + .long 3221225472 + .long 1073291766 + .long 1641333636 + .long 1044361352 + .long 536870912 + .long 1073291768 + .long 1373968792 + .long 1043402654 + .long 134217728 + .long 1073291769 + .long 2033191599 + .long 1044970710 + .long 3087007744 + .long 1073291769 + .long 4117947437 + .long 1044731035 + .long 805306368 + .long 1073291770 + .long 315378368 + .long 3188950787 + .long 2281701376 + .long 1073291770 + .long 2428571750 + .long 3189838838 + .long 3221225472 + .long 1073291770 + .long 1608007466 + .long 1044361352 + .long 4026531840 + .long 1073291770 + .long 1895711420 + .long 3191935434 + .long 134217728 + .long 1073291771 + .long 2031108713 + .long 1044970710 + .long 536870912 + .long 1073291771 + .long 1362518342 + .long 1043402654 + .long 805306368 + .long 1073291771 + .long 317461253 + .long 3188950787 + .long 939524096 + .long 1073291771 + .long 4117231784 + .long 1044731035 + .long 1073741824 + .long 1073291771 + .long 1607942376 + .long 1044361352 + .long 1207959552 + .long 1073291771 + .long 2428929577 + .long 3189838838 + .long 1207959552 + .long 1073291771 + .long 2031104645 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1895722602 + .long 3191935434 + .long 1342177280 + .long 1073291771 + .long 317465322 + .long 3188950787 + .long 1342177280 + .long 1073291771 + .long 1362515546 + .long 1043402654 + .long 1342177280 + .long 1073291771 + .long 1607942248 + .long 1044361352 + .long 1342177280 + .long 1073291771 + .long 4117231610 + .long 1044731035 + .long 1342177280 + .long 1073291771 + .long 2031104637 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1540251232 + .long 1045150466 + .long 1342177280 + .long 1073291771 + .long 2644671394 + .long 1045270303 + .long 1342177280 + .long 1073291771 + .long 2399244691 + .long 1045360181 + .long 1342177280 + .long 1073291771 + .long 803971124 + .long 1045420100 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192879152 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192849193 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192826724 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192811744 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192800509 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192793019 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192787402 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192783657 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192780848 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192778976 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192777572 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192776635 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192775933 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192775465 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192775114 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774880 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774704 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774587 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774500 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774441 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774397 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774368 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774346 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774331 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774320 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774313 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774308 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774304 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774301 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774299 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774298 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774297 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1466225875 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1343512524 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1251477510 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1190120835 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1144103328 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1113424990 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1090416237 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1075077068 + .long 3192774295 + .long 1431655765 + .long 3218429269 + .long 2576978363 + .long 1070176665 + .long 2453154343 + .long 3217180964 + .long 4189149139 + .long 1069314502 + .long 1775019125 + .long 3216459198 + .long 273199057 + .long 1068739452 + .long 874748308 + .long 3215993277 + .long 0 + .long 1017118720 + .long 0 + .long 1069547520 + .long 0 + .long 1129316352 + .long 0 + .long 1072693248 + .long 1413754136 + .long 1073291771 + .long 856972295 + .long 1016178214 + .long 33554432 + .long 1101004800 + .type __datan_la_CoutTab,@object + .size __datan_la_CoutTab,1936 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S new file mode 100644 index 0000000000..723734e10b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized atan, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8v_atan _ZGVeN8v_atan_avx2_wrapper +#include "../svml_d_atan8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c new file mode 100644 index 0000000000..e97a41b6bc --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized atan, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8v_atan +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_atan, __GI__ZGVeN8v_atan, __redirect__ZGVeN8v_atan) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S new file mode 100644 index 0000000000..727d6ab71c --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S @@ -0,0 +1,1323 @@ +/* Function atan vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVeN8v_atan_skx) + vmovups 64+__svml_datan_data_internal_avx512(%rip), %zmm4 + vmovups 128+__svml_datan_data_internal_avx512(%rip), %zmm3 + vmovups 256+__svml_datan_data_internal_avx512(%rip), %zmm9 + +/* saturate X range */ + vmovups 320+__svml_datan_data_internal_avx512(%rip), %zmm7 + vandpd __svml_datan_data_internal_avx512(%rip), %zmm0, %zmm8 + +/* R+Rl = DiffX/Y */ + vbroadcastsd .FLT_14(%rip), %zmm15 + vaddpd {rn-sae}, %zmm4, %zmm8, %zmm2 + vxorpd %zmm0, %zmm8, %zmm1 + vcmppd $29, {sae}, %zmm3, %zmm8, %k2 + +/* round to 2 bits after binary point */ + vreducepd $40, {sae}, %zmm8, %zmm6 + vsubpd {rn-sae}, %zmm4, %zmm2, %zmm5 + +/* + * if|X|>=MaxThreshold, set DiffX=-1 + * VMSUB(D, DiffX, LargeMask, Zero, One); + */ + vblendmpd 192+__svml_datan_data_internal_avx512(%rip), %zmm6, %zmm10{%k2} + vfmadd231pd {rn-sae}, %zmm8, %zmm5, %zmm9 + vmovups 960+__svml_datan_data_internal_avx512(%rip), %zmm5 + +/* table lookup sequence */ + vmovups 448+__svml_datan_data_internal_avx512(%rip), %zmm6 + vgetmantpd $0, {sae}, %zmm10, %zmm14 + vgetexppd {sae}, %zmm10, %zmm11 + vmovups 1408+__svml_datan_data_internal_avx512(%rip), %zmm10 + +/* + * if|X|>=MaxThreshold, set Y=X + * VMADD(D, Y, LargeMask, X, Zero); + */ + vminpd {sae}, %zmm8, %zmm7, %zmm9{%k2} + vcmppd $29, {sae}, %zmm5, %zmm2, %k1 + vmovups 576+__svml_datan_data_internal_avx512(%rip), %zmm7 + vmovups 1152+__svml_datan_data_internal_avx512(%rip), %zmm8 + vgetmantpd $0, {sae}, %zmm9, %zmm3 + vgetexppd {sae}, %zmm9, %zmm12 + vmovups 1280+__svml_datan_data_internal_avx512(%rip), %zmm9 + vpermt2pd 512+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm6 + vsubpd {rn-sae}, %zmm12, %zmm11, %zmm4 + vpermt2pd 640+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm7 + vrcp14pd %zmm3, %zmm13 + vmovups 1344+__svml_datan_data_internal_avx512(%rip), %zmm12 + vmovups 1472+__svml_datan_data_internal_avx512(%rip), %zmm11 + vblendmpd %zmm7, %zmm6, %zmm2{%k1} + vmulpd {rn-sae}, %zmm13, %zmm14, %zmm0 + vfnmadd231pd {rn-sae}, %zmm3, %zmm13, %zmm15 + vfnmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm3 + vfmadd213pd {rn-sae}, %zmm15, %zmm15, %zmm15 + vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm15 + vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm3 + vscalefpd {rn-sae}, %zmm4, %zmm3, %zmm0 + +/* set table value to Pi/2 for large X */ + vblendmpd 1024+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm3{%k2} + vmovups 1216+__svml_datan_data_internal_avx512(%rip), %zmm2 + +/* polynomial evaluation */ + vmulpd {rn-sae}, %zmm0, %zmm0, %zmm14 + vmulpd {rn-sae}, %zmm14, %zmm14, %zmm13 + vmulpd {rn-sae}, %zmm0, %zmm14, %zmm15 + vfmadd231pd {rn-sae}, %zmm14, %zmm8, %zmm2 + vfmadd231pd {rn-sae}, %zmm14, %zmm9, %zmm12 + vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14 + vfmadd213pd {rn-sae}, %zmm12, %zmm13, %zmm2 + vfmadd213pd {rn-sae}, %zmm14, %zmm13, %zmm2 + vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm2 + vaddpd {rn-sae}, %zmm3, %zmm2, %zmm0 + vxorpd %zmm1, %zmm0, %zmm0 + ret + +END(_ZGVeN8v_atan_skx) + + .align 16,0x90 + +__svml_datan_cout_rare_internal: + + cfi_startproc + + movzwl 6(%rdi), %r8d + andl $32752, %r8d + shrl $4, %r8d + cmpl $2047, %r8d + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -16(%rsp) + shrq $56, %rdx + movb 7(%rdi), %al + andl $127, %edx + movb %dl, -9(%rsp) + movsd -16(%rsp), %xmm0 + shrb $7, %al + comisd 1888+__datan_la_CoutTab(%rip), %xmm0 + movl -12(%rsp), %ecx + jb .LBL_2_6 + movsd 1896+__datan_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_5 + movl 4(%rdi), %edx + movl %ecx, %edi + andl $-524288, %ecx + andl $-1048576, %edi + addl $262144, %ecx + movaps %xmm0, %xmm9 + andl $1048575, %ecx + movaps %xmm0, %xmm10 + movsd %xmm0, -56(%rsp) + orl %ecx, %edi + movl $0, -56(%rsp) + andl $1048575, %edx + movl %edi, -52(%rsp) + lea __datan_la_CoutTab(%rip), %rcx + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd -56(%rsp), %xmm15 + shll $20, %r8d + subsd -56(%rsp), %xmm9 + mulsd 1928+__datan_la_CoutTab(%rip), %xmm10 + shlb $7, %al + mulsd %xmm9, %xmm4 + movsd %xmm4, -48(%rsp) + orl %edx, %r8d + movsd -48(%rsp), %xmm5 + addl $-1069547520, %r8d + sarl $18, %r8d + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + andl $-2, %r8d + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + movslq %r8d, %r8 + subsd %xmm6, %xmm7 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm2 + movsd -40(%rsp), %xmm3 + movsd %xmm10, -48(%rsp) + movsd -48(%rsp), %xmm11 + movsd 1904+__datan_la_CoutTab(%rip), %xmm8 + subsd -16(%rsp), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -48(%rsp), %xmm13 + movsd -40(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -48(%rsp) + movsd -48(%rsp), %xmm14 + subsd %xmm14, %xmm0 + movsd 1904+__datan_la_CoutTab(%rip), %xmm14 + movsd %xmm0, -40(%rsp) + movsd -48(%rsp), %xmm5 + movsd -40(%rsp), %xmm4 + mulsd %xmm15, %xmm5 + mulsd %xmm15, %xmm4 + movaps %xmm5, %xmm1 + addsd %xmm4, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + subsd %xmm0, %xmm5 + addsd %xmm4, %xmm5 + movsd 1928+__datan_la_CoutTab(%rip), %xmm4 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm11 + movsd -40(%rsp), %xmm1 + addsd %xmm11, %xmm6 + movsd %xmm6, -48(%rsp) + movsd -48(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -40(%rsp) + movsd -48(%rsp), %xmm10 + movsd -40(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm12 + movsd 1928+__datan_la_CoutTab(%rip), %xmm10 + addsd %xmm11, %xmm12 + movsd %xmm12, -40(%rsp) + movsd -32(%rsp), %xmm13 + movsd 1904+__datan_la_CoutTab(%rip), %xmm11 + subsd %xmm13, %xmm14 + movsd %xmm14, -32(%rsp) + movsd -40(%rsp), %xmm0 + movsd -32(%rsp), %xmm15 + addsd %xmm15, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -48(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -32(%rsp), %xmm0 + movsd %xmm4, -48(%rsp) + addsd %xmm1, %xmm0 + movsd -48(%rsp), %xmm5 + subsd %xmm9, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -48(%rsp), %xmm7 + movsd -40(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd 1904+__datan_la_CoutTab(%rip), %xmm6 + movsd %xmm7, -48(%rsp) + movsd -48(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -40(%rsp) + movsd -48(%rsp), %xmm4 + divsd %xmm4, %xmm11 + mulsd %xmm11, %xmm10 + movsd -40(%rsp), %xmm5 + movsd %xmm10, -40(%rsp) + addsd %xmm0, %xmm5 + movsd -40(%rsp), %xmm12 + subsd %xmm11, %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm10 + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -32(%rsp), %xmm14 + mulsd %xmm14, %xmm4 + movsd -32(%rsp), %xmm15 + subsd %xmm4, %xmm6 + mulsd %xmm15, %xmm5 + movsd %xmm5, -40(%rsp) + movsd -40(%rsp), %xmm1 + subsd %xmm1, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -40(%rsp), %xmm4 + movsd -32(%rsp), %xmm5 + movsd -40(%rsp), %xmm0 + movaps %xmm5, %xmm7 + movsd -32(%rsp), %xmm1 + mulsd %xmm3, %xmm5 + addsd 1904+__datan_la_CoutTab(%rip), %xmm4 + mulsd %xmm2, %xmm7 + mulsd %xmm0, %xmm4 + mulsd %xmm1, %xmm4 + mulsd %xmm4, %xmm3 + mulsd %xmm4, %xmm2 + addsd %xmm3, %xmm5 + movsd 1872+__datan_la_CoutTab(%rip), %xmm6 + addsd %xmm2, %xmm5 + movsd %xmm5, -48(%rsp) + movaps %xmm7, %xmm2 + movsd -48(%rsp), %xmm4 + addsd %xmm4, %xmm2 + movsd %xmm2, -48(%rsp) + movsd -48(%rsp), %xmm3 + movsd (%rcx,%r8,8), %xmm2 + subsd %xmm3, %xmm7 + addsd %xmm4, %xmm7 + movsd %xmm7, -40(%rsp) + movsd -48(%rsp), %xmm3 + movaps %xmm3, %xmm5 + movaps %xmm3, %xmm0 + mulsd %xmm3, %xmm5 + addsd %xmm2, %xmm0 + mulsd %xmm5, %xmm6 + movsd -40(%rsp), %xmm10 + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm1 + addsd 1864+__datan_la_CoutTab(%rip), %xmm6 + subsd %xmm1, %xmm2 + mulsd %xmm5, %xmm6 + addsd %xmm3, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + movsd %xmm2, -40(%rsp) + movsd -48(%rsp), %xmm9 + movsd -40(%rsp), %xmm8 + addsd 1848+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1840+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1832+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + addsd 1824+__datan_la_CoutTab(%rip), %xmm6 + mulsd %xmm5, %xmm6 + mulsd %xmm3, %xmm6 + addsd %xmm6, %xmm10 + addsd 8(%rcx,%r8,8), %xmm10 + addsd %xmm8, %xmm10 + addsd %xmm9, %xmm10 + movsd %xmm10, -24(%rsp) + movb -17(%rsp), %r9b + andb $127, %r9b + orb %al, %r9b + movb %r9b, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_5: + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + shlb $7, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_6: + comisd 1880+__datan_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movaps %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + shlb $7, %al + movsd 1872+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1864+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1856+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1848+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1840+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1832+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + addsd 1824+__datan_la_CoutTab(%rip), %xmm2 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm2 + addsd %xmm0, %xmm2 + movsd %xmm2, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_8: + movzwl -10(%rsp), %edx + testl $32752, %edx + je .LBL_2_10 + movsd 1904+__datan_la_CoutTab(%rip), %xmm1 + shlb $7, %al + addsd %xmm0, %xmm1 + movsd %xmm1, -48(%rsp) + movsd -48(%rsp), %xmm0 + mulsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + jmp .LBL_2_11 + +.LBL_2_10: + mulsd %xmm0, %xmm0 + shlb $7, %al + movsd %xmm0, -48(%rsp) + movsd -48(%rsp), %xmm0 + addsd -16(%rsp), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rax + movq %rax, (%rsi) + +.LBL_2_11: + xorl %eax, %eax + ret + +.LBL_2_12: + testl $1048575, 4(%rdi) + jne .LBL_2_15 + cmpl $0, (%rdi) + jne .LBL_2_15 + movsd 1912+__datan_la_CoutTab(%rip), %xmm0 + movb 7(%rdi), %al + andb $-128, %al + addsd 1920+__datan_la_CoutTab(%rip), %xmm0 + movsd %xmm0, -24(%rsp) + movb -17(%rsp), %dl + andb $127, %dl + orb %al, %dl + movb %dl, -17(%rsp) + movq -24(%rsp), %rcx + movq %rcx, (%rsi) + jmp .LBL_2_11 + +.LBL_2_15: + movsd (%rdi), %xmm0 + addsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_11 + + cfi_endproc + + .type __svml_datan_cout_rare_internal,@function + .size __svml_datan_cout_rare_internal,.-__svml_datan_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_datan_data_internal_avx512: + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1125646336 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 1075806208 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 1206910976 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 4180443357 + .long 1070553973 + .long 90291023 + .long 1071492199 + .long 2737217249 + .long 1071945615 + .long 1413754136 + .long 1072243195 + .long 1468297118 + .long 1072475260 + .long 3531732635 + .long 1072657163 + .long 744202399 + .long 1072747407 + .long 2464923204 + .long 1072805601 + .long 1436891685 + .long 1072853231 + .long 2037009832 + .long 1072892781 + .long 1826698067 + .long 1072926058 + .long 1803191648 + .long 1072954391 + .long 2205372832 + .long 1072978772 + .long 4234512805 + .long 1072999952 + .long 3932628503 + .long 1073018509 + .long 2501811453 + .long 1073034892 + .long 866379431 + .long 1073049455 + .long 1376865888 + .long 1073062480 + .long 3290094269 + .long 1073074195 + .long 354764887 + .long 1073084787 + .long 3332975497 + .long 1073094406 + .long 1141460092 + .long 1073103181 + .long 745761286 + .long 1073111216 + .long 1673304509 + .long 1073118600 + .long 983388243 + .long 1073125409 + .long 3895509104 + .long 1073131706 + .long 2128523669 + .long 1073137548 + .long 2075485693 + .long 1073142981 + .long 121855980 + .long 1073148047 + .long 4181733783 + .long 1073152780 + .long 2887813284 + .long 1073157214 + .long 0 + .long 0 + .long 1022865341 + .long 1013492590 + .long 573531618 + .long 1014639487 + .long 2280825944 + .long 1014120858 + .long 856972295 + .long 1015129638 + .long 986810987 + .long 1015077601 + .long 2062601149 + .long 1013974920 + .long 589036912 + .long 3164328156 + .long 1787331214 + .long 1016798022 + .long 2942272763 + .long 3164235441 + .long 2956702105 + .long 1016472908 + .long 3903328092 + .long 3162582135 + .long 3175026820 + .long 3158589859 + .long 787328196 + .long 1014621351 + .long 2317874517 + .long 3163795518 + .long 4071621134 + .long 1016673529 + .long 2492111345 + .long 3164172103 + .long 3606178875 + .long 3162371821 + .long 3365790232 + .long 1014547152 + .long 2710887773 + .long 1017086651 + .long 2755350986 + .long 3162706257 + .long 198095269 + .long 3162802133 + .long 2791076759 + .long 3164364640 + .long 4214434319 + .long 3162164074 + .long 773754012 + .long 3164190653 + .long 139561443 + .long 3164313657 + .long 2197796619 + .long 3164066219 + .long 3592486882 + .long 1016669082 + .long 1148791015 + .long 3163724934 + .long 386789398 + .long 3163117479 + .long 2518816264 + .long 3162291736 + .long 2545101323 + .long 3164592727 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 16 + .long 1125646336 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 1413754136 + .long 1073291771 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 856972295 + .long 1016178214 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 4123328151 + .long 1068689849 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 3295121612 + .long 3216458327 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 4026078880 + .long 1069314495 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2398029018 + .long 3217180964 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 2576905246 + .long 1070176665 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .long 1431655757 + .long 3218429269 + .type __svml_datan_data_internal_avx512,@object + .size __svml_datan_data_internal_avx512,1536 + .align 32 + +__datan_la_CoutTab: + .long 3892314112 + .long 1069799150 + .long 2332892550 + .long 1039715405 + .long 1342177280 + .long 1070305495 + .long 270726690 + .long 1041535749 + .long 939524096 + .long 1070817911 + .long 2253973841 + .long 3188654726 + .long 3221225472 + .long 1071277294 + .long 3853927037 + .long 1043226911 + .long 2818572288 + .long 1071767563 + .long 2677759107 + .long 1044314101 + .long 3355443200 + .long 1072103591 + .long 1636578514 + .long 3191094734 + .long 1476395008 + .long 1072475260 + .long 1864703685 + .long 3188646936 + .long 805306368 + .long 1072747407 + .long 192551812 + .long 3192726267 + .long 2013265920 + .long 1072892781 + .long 2240369452 + .long 1043768538 + .long 0 + .long 1072999953 + .long 3665168337 + .long 3192705970 + .long 402653184 + .long 1073084787 + .long 1227953434 + .long 3192313277 + .long 2013265920 + .long 1073142981 + .long 3853283127 + .long 1045277487 + .long 805306368 + .long 1073187261 + .long 1676192264 + .long 3192868861 + .long 134217728 + .long 1073217000 + .long 4290763938 + .long 1042034855 + .long 671088640 + .long 1073239386 + .long 994303084 + .long 3189643768 + .long 402653184 + .long 1073254338 + .long 1878067156 + .long 1042652475 + .long 1610612736 + .long 1073265562 + .long 670314820 + .long 1045138554 + .long 3221225472 + .long 1073273048 + .long 691126919 + .long 3189987794 + .long 3489660928 + .long 1073278664 + .long 1618990832 + .long 3188194509 + .long 1207959552 + .long 1073282409 + .long 2198872939 + .long 1044806069 + .long 3489660928 + .long 1073285217 + .long 2633982383 + .long 1042307894 + .long 939524096 + .long 1073287090 + .long 1059367786 + .long 3189114230 + .long 2281701376 + .long 1073288494 + .long 3158525533 + .long 1044484961 + .long 3221225472 + .long 1073289430 + .long 286581777 + .long 1044893263 + .long 4026531840 + .long 1073290132 + .long 2000245215 + .long 3191647611 + .long 134217728 + .long 1073290601 + .long 4205071590 + .long 1045035927 + .long 536870912 + .long 1073290952 + .long 2334392229 + .long 1043447393 + .long 805306368 + .long 1073291186 + .long 2281458177 + .long 3188885569 + .long 3087007744 + .long 1073291361 + .long 691611507 + .long 1044733832 + .long 3221225472 + .long 1073291478 + .long 1816229550 + .long 1044363390 + .long 2281701376 + .long 1073291566 + .long 1993843750 + .long 3189837440 + .long 134217728 + .long 1073291625 + .long 3654754496 + .long 1044970837 + .long 4026531840 + .long 1073291668 + .long 3224300229 + .long 3191935390 + .long 805306368 + .long 1073291698 + .long 2988777976 + .long 3188950659 + .long 536870912 + .long 1073291720 + .long 1030371341 + .long 1043402665 + .long 3221225472 + .long 1073291734 + .long 1524463765 + .long 1044361356 + .long 3087007744 + .long 1073291745 + .long 2754295320 + .long 1044731036 + .long 134217728 + .long 1073291753 + .long 3099629057 + .long 1044970710 + .long 2281701376 + .long 1073291758 + .long 962914160 + .long 3189838838 + .long 805306368 + .long 1073291762 + .long 3543908206 + .long 3188950786 + .long 4026531840 + .long 1073291764 + .long 1849909620 + .long 3191935434 + .long 3221225472 + .long 1073291766 + .long 1641333636 + .long 1044361352 + .long 536870912 + .long 1073291768 + .long 1373968792 + .long 1043402654 + .long 134217728 + .long 1073291769 + .long 2033191599 + .long 1044970710 + .long 3087007744 + .long 1073291769 + .long 4117947437 + .long 1044731035 + .long 805306368 + .long 1073291770 + .long 315378368 + .long 3188950787 + .long 2281701376 + .long 1073291770 + .long 2428571750 + .long 3189838838 + .long 3221225472 + .long 1073291770 + .long 1608007466 + .long 1044361352 + .long 4026531840 + .long 1073291770 + .long 1895711420 + .long 3191935434 + .long 134217728 + .long 1073291771 + .long 2031108713 + .long 1044970710 + .long 536870912 + .long 1073291771 + .long 1362518342 + .long 1043402654 + .long 805306368 + .long 1073291771 + .long 317461253 + .long 3188950787 + .long 939524096 + .long 1073291771 + .long 4117231784 + .long 1044731035 + .long 1073741824 + .long 1073291771 + .long 1607942376 + .long 1044361352 + .long 1207959552 + .long 1073291771 + .long 2428929577 + .long 3189838838 + .long 1207959552 + .long 1073291771 + .long 2031104645 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1895722602 + .long 3191935434 + .long 1342177280 + .long 1073291771 + .long 317465322 + .long 3188950787 + .long 1342177280 + .long 1073291771 + .long 1362515546 + .long 1043402654 + .long 1342177280 + .long 1073291771 + .long 1607942248 + .long 1044361352 + .long 1342177280 + .long 1073291771 + .long 4117231610 + .long 1044731035 + .long 1342177280 + .long 1073291771 + .long 2031104637 + .long 1044970710 + .long 1342177280 + .long 1073291771 + .long 1540251232 + .long 1045150466 + .long 1342177280 + .long 1073291771 + .long 2644671394 + .long 1045270303 + .long 1342177280 + .long 1073291771 + .long 2399244691 + .long 1045360181 + .long 1342177280 + .long 1073291771 + .long 803971124 + .long 1045420100 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192879152 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192849193 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192826724 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192811744 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192800509 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192793019 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192787402 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192783657 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192780848 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192778976 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192777572 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192776635 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192775933 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192775465 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192775114 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774880 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774704 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774587 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774500 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774441 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774397 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774368 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774346 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774331 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774320 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774313 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774308 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774304 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774301 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774299 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774298 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774297 + .long 1476395008 + .long 1073291771 + .long 3613709523 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 177735686 + .long 3192774296 + .long 1476395008 + .long 1073291771 + .long 3490996172 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2754716064 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 2263862659 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1895722605 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1650295902 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1466225875 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1343512524 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1251477510 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1190120835 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1144103328 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1113424990 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1090416237 + .long 3192774295 + .long 1476395008 + .long 1073291771 + .long 1075077068 + .long 3192774295 + .long 1431655765 + .long 3218429269 + .long 2576978363 + .long 1070176665 + .long 2453154343 + .long 3217180964 + .long 4189149139 + .long 1069314502 + .long 1775019125 + .long 3216459198 + .long 273199057 + .long 1068739452 + .long 874748308 + .long 3215993277 + .long 0 + .long 1017118720 + .long 0 + .long 1069547520 + .long 0 + .long 1129316352 + .long 0 + .long 1072693248 + .long 1413754136 + .long 1073291771 + .long 856972295 + .long 1016178214 + .long 33554432 + .long 1101004800 + .type __datan_la_CoutTab,@object + .size __datan_la_CoutTab,1936 + .align 8 + +.FLT_14: + .long 0x00000000,0x3ff00000 + .type .FLT_14,@object + .size .FLT_14,8 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S new file mode 100644 index 0000000000..27623cdf16 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized atanf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16v_atanf _ZGVeN16v_atanf_avx2_wrapper +#include "../svml_s_atanf16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c new file mode 100644 index 0000000000..940de26615 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized atanf, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16v_atanf +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_atanf, __GI__ZGVeN16v_atanf, + __redirect__ZGVeN16v_atanf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S new file mode 100644 index 0000000000..aa35261d9b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S @@ -0,0 +1,352 @@ +/* Function atanf vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVeN16v_atanf_skx) + vandps __svml_satan_data_internal_avx512(%rip), %zmm0, %zmm7 + vmovups 128+__svml_satan_data_internal_avx512(%rip), %zmm3 + vmovups 256+__svml_satan_data_internal_avx512(%rip), %zmm8 + +/* round to 2 bits after binary point */ + vreduceps $40, {sae}, %zmm7, %zmm5 + +/* saturate X range */ + vmovups 320+__svml_satan_data_internal_avx512(%rip), %zmm6 + vmovups 64+__svml_satan_data_internal_avx512(%rip), %zmm2 + vcmpps $29, {sae}, %zmm3, %zmm7, %k1 + +/* table lookup sequence */ + vmovups 448+__svml_satan_data_internal_avx512(%rip), %zmm3 + vsubps {rn-sae}, %zmm5, %zmm7, %zmm4 + vaddps {rn-sae}, %zmm2, %zmm7, %zmm1 + vxorps %zmm0, %zmm7, %zmm0 + vfmadd231ps {rn-sae}, %zmm7, %zmm4, %zmm8 + vmovups 896+__svml_satan_data_internal_avx512(%rip), %zmm4 + +/* if|X|>=MaxThreshold, set DiffX=-1 */ + vblendmps 192+__svml_satan_data_internal_avx512(%rip), %zmm5, %zmm9{%k1} + vmovups 960+__svml_satan_data_internal_avx512(%rip), %zmm5 + +/* if|X|>=MaxThreshold, set Y=X */ + vminps {sae}, %zmm7, %zmm6, %zmm8{%k1} + +/* R+Rl = DiffX/Y */ + vgetmantps $0, {sae}, %zmm9, %zmm12 + vgetexpps {sae}, %zmm9, %zmm10 + vpermt2ps 512+__svml_satan_data_internal_avx512(%rip), %zmm1, %zmm3 + vgetmantps $0, {sae}, %zmm8, %zmm15 + vgetexpps {sae}, %zmm8, %zmm11 + vmovups 832+__svml_satan_data_internal_avx512(%rip), %zmm1 + +/* set table value to Pi/2 for large X */ + vblendmps 704+__svml_satan_data_internal_avx512(%rip), %zmm3, %zmm9{%k1} + vrcp14ps %zmm15, %zmm13 + vsubps {rn-sae}, %zmm11, %zmm10, %zmm2 + vmulps {rn-sae}, %zmm13, %zmm12, %zmm14 + vfnmadd213ps {rn-sae}, %zmm12, %zmm14, %zmm15 + vfmadd213ps {rn-sae}, %zmm14, %zmm13, %zmm15 + vscalefps {rn-sae}, %zmm2, %zmm15, %zmm7 + +/* polynomial evaluation */ + vmulps {rn-sae}, %zmm7, %zmm7, %zmm8 + vmulps {rn-sae}, %zmm7, %zmm8, %zmm6 + vfmadd231ps {rn-sae}, %zmm8, %zmm1, %zmm4 + vfmadd213ps {rn-sae}, %zmm5, %zmm4, %zmm8 + vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm8 + vaddps {rn-sae}, %zmm9, %zmm8, %zmm10 + vxorps %zmm0, %zmm10, %zmm0 + ret + +END(_ZGVeN16v_atanf_skx) + + .section .rodata, "a" + .align 64 + +__svml_satan_data_internal_avx512: + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1241513984 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 1089994752 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 1333788672 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1048239024 + .long 1055744824 + .long 1059372157 + .long 1061752795 + .long 1063609315 + .long 1065064543 + .long 1065786489 + .long 1066252045 + .long 1066633083 + .long 1066949484 + .long 1067215699 + .long 1067442363 + .long 1067637412 + .long 1067806856 + .long 1067955311 + .long 1068086373 + .long 1068202874 + .long 1068307075 + .long 1068400798 + .long 1068485529 + .long 1068562486 + .long 1068632682 + .long 1068696961 + .long 1068756035 + .long 1068810506 + .long 1068860887 + .long 1068907620 + .long 1068951084 + .long 1068991608 + .long 1069029480 + .long 1069064949 + .long 0 + .long 2975494116 + .long 833369962 + .long 835299256 + .long 2998648110 + .long 2995239174 + .long 3000492182 + .long 860207626 + .long 3008447516 + .long 3005590622 + .long 3000153675 + .long 860754741 + .long 859285590 + .long 844944488 + .long 2993069463 + .long 858157665 + .long 3006142000 + .long 3007693206 + .long 3009342234 + .long 847469400 + .long 3006114683 + .long 852829553 + .long 847325583 + .long 860305056 + .long 846145135 + .long 2997638646 + .long 855837703 + .long 2979047222 + .long 2995344192 + .long 854092798 + .long 3000498637 + .long 859965578 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3007036718 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 3188697310 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 1045219554 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .long 3198855850 + .type __svml_satan_data_internal_avx512,@object + .size __svml_satan_data_internal_avx512,1024 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S new file mode 100644 index 0000000000..fe81170666 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized atanf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4v_atanf _ZGVbN4v_atanf_sse2 +#include "../svml_s_atanf4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c new file mode 100644 index 0000000000..975ece6812 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized atanf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4v_atanf +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_atanf, __GI__ZGVbN4v_atanf, + __redirect__ZGVbN4v_atanf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S new file mode 100644 index 0000000000..45d9803519 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S @@ -0,0 +1,884 @@ +/* Function atanf vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVbN4v_atanf_sse4) + movaps %xmm0, %xmm6 + +/* + * To use minps\maxps operations for argument reduction + * uncomment _AT_USEMINMAX_ definition + * Declarations + * Variables + * Constants + */ + movups 64+__svml_satan_data_internal(%rip), %xmm3 + +/* + * 1) If x>1, then r=-1/x, PIO2=Pi/2 + * 2) If -1<=x<=1, then r=x, PIO2=0 + * 3) If x<-1, then r=-1/x, PIO2=-Pi/2 + */ + movups 128+__svml_satan_data_internal(%rip), %xmm2 + andps %xmm6, %xmm3 + movaps %xmm3, %xmm1 + movaps %xmm2, %xmm4 + cmpleps %xmm2, %xmm1 + maxps %xmm3, %xmm4 + minps %xmm3, %xmm2 + divps %xmm4, %xmm2 + movups __svml_satan_data_internal(%rip), %xmm5 + movaps %xmm1, %xmm0 + andps %xmm5, %xmm6 + andnps %xmm5, %xmm1 + pxor %xmm6, %xmm1 + pxor %xmm2, %xmm1 + +/* Polynomial. */ + movaps %xmm1, %xmm4 + mulps %xmm1, %xmm4 + movaps %xmm4, %xmm3 + mulps %xmm4, %xmm3 + movups 384+__svml_satan_data_internal(%rip), %xmm2 + mulps %xmm3, %xmm2 + movups 448+__svml_satan_data_internal(%rip), %xmm7 + mulps %xmm3, %xmm7 + addps 512+__svml_satan_data_internal(%rip), %xmm2 + mulps %xmm3, %xmm2 + addps 576+__svml_satan_data_internal(%rip), %xmm7 + mulps %xmm3, %xmm7 + addps 640+__svml_satan_data_internal(%rip), %xmm2 + mulps %xmm3, %xmm2 + addps 704+__svml_satan_data_internal(%rip), %xmm7 + mulps %xmm7, %xmm3 + addps 768+__svml_satan_data_internal(%rip), %xmm2 + mulps %xmm4, %xmm2 + addps 832+__svml_satan_data_internal(%rip), %xmm3 + andnps 256+__svml_satan_data_internal(%rip), %xmm0 + addps %xmm2, %xmm3 + mulps %xmm3, %xmm4 + pxor %xmm6, %xmm0 + addps 896+__svml_satan_data_internal(%rip), %xmm4 + +/* Reconstruction. */ + mulps %xmm4, %xmm1 + addps %xmm1, %xmm0 + ret + +END(_ZGVbN4v_atanf_sse4) + + .section .rodata, "a" + .align 64 + +__svml_satan_data_internal: + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_satan_data_internal,@object + .size __svml_satan_data_internal,960 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S new file mode 100644 index 0000000000..1652a8f5c6 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized atanf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8v_atanf _ZGVdN8v_atanf_sse_wrapper +#include "../svml_s_atanf8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c new file mode 100644 index 0000000000..733d8c3bc3 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized atanf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8v_atanf +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_atanf, __GI__ZGVdN8v_atanf, + __redirect__ZGVdN8v_atanf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S new file mode 100644 index 0000000000..0453d30551 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S @@ -0,0 +1,688 @@ +/* Function atanf vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * For 0.0 <= x <= 7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x) + * For 7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x) + * For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x) + * For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x) + * For 39.0/16.0 <= x <= inf : atan(x) = atan(inf) + atan(s), where s=-1.0/x + * Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16. + * + */ + +#include + + .text +ENTRY(_ZGVdN8v_atanf_avx2) +/* + * 1) If x>1, then r=-1/x, PIO2=Pi/2 + * 2) If -1<=x<=1, then r=x, PIO2=0 + * 3) If x<-1, then r=-1/x, PIO2=-Pi/2 + */ + vmovups 128+__svml_satan_data_internal(%rip), %ymm7 + vmovups __svml_satan_data_internal(%rip), %ymm4 + vmovaps %ymm0, %ymm5 + +/* + * To use minps\maxps operations for argument reduction + * uncomment _AT_USEMINMAX_ definition + * Declarations + * Variables + * Constants + */ + vandps 64+__svml_satan_data_internal(%rip), %ymm5, %ymm0 + vcmple_oqps %ymm7, %ymm0, %ymm3 + vandps %ymm4, %ymm5, %ymm2 + vmaxps %ymm0, %ymm7, %ymm5 + vminps %ymm0, %ymm7, %ymm0 + vandnps 256+__svml_satan_data_internal(%rip), %ymm3, %ymm6 + vxorps %ymm2, %ymm6, %ymm1 + vdivps %ymm5, %ymm0, %ymm6 + vmovups 384+__svml_satan_data_internal(%rip), %ymm0 + vandnps %ymm4, %ymm3, %ymm3 + vxorps %ymm2, %ymm3, %ymm2 + vxorps %ymm6, %ymm2, %ymm3 + +/* Polynomial. */ + vmulps %ymm3, %ymm3, %ymm7 + vmovups 448+__svml_satan_data_internal(%rip), %ymm2 + vmulps %ymm7, %ymm7, %ymm4 + vfmadd213ps 512+__svml_satan_data_internal(%rip), %ymm4, %ymm0 + vfmadd213ps 576+__svml_satan_data_internal(%rip), %ymm4, %ymm2 + vfmadd213ps 640+__svml_satan_data_internal(%rip), %ymm4, %ymm0 + vfmadd213ps 704+__svml_satan_data_internal(%rip), %ymm4, %ymm2 + vfmadd213ps 768+__svml_satan_data_internal(%rip), %ymm4, %ymm0 + vfmadd213ps 832+__svml_satan_data_internal(%rip), %ymm4, %ymm2 + vfmadd213ps %ymm2, %ymm7, %ymm0 + vfmadd213ps 896+__svml_satan_data_internal(%rip), %ymm7, %ymm0 + +/* Reconstruction. */ + vfmadd213ps %ymm1, %ymm3, %ymm0 + ret + +END(_ZGVdN8v_atanf_avx2) + + .section .rodata, "a" + .align 64 + +__svml_satan_data_internal: + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .long 1073741824 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .long 1070141403 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .long 993144000 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .long 3162449457 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .long 1026278276 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .long 3180885545 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .long 1037657204 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .long 3188810232 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .long 1045215135 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .long 3198855753 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_satan_data_internal,@object + .size __svml_satan_data_internal,960 diff --git a/sysdeps/x86_64/fpu/svml_d_atan2_core.S b/sysdeps/x86_64/fpu/svml_d_atan2_core.S new file mode 100644 index 0000000000..e86d5b7047 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_atan2_core.S @@ -0,0 +1,29 @@ +/* Function atan vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_atan) +WRAPPER_IMPL_SSE2 atan +END (_ZGVbN2v_atan) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2v_atan) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core.S b/sysdeps/x86_64/fpu/svml_d_atan4_core.S new file mode 100644 index 0000000000..eb11fd2f17 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_atan4_core.S @@ -0,0 +1,29 @@ +/* Function atan vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_atan) +WRAPPER_IMPL_AVX _ZGVbN2v_atan +END (_ZGVdN4v_atan) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4v_atan) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S new file mode 100644 index 0000000000..b83a4be33d --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S @@ -0,0 +1,25 @@ +/* Function atan vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_atan) +WRAPPER_IMPL_AVX _ZGVbN2v_atan +END (_ZGVcN4v_atan) diff --git a/sysdeps/x86_64/fpu/svml_d_atan8_core.S b/sysdeps/x86_64/fpu/svml_d_atan8_core.S new file mode 100644 index 0000000000..9685a32bdc --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_atan8_core.S @@ -0,0 +1,25 @@ +/* Function atan vectorized with AVX-512, wrapper to AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_atan) +WRAPPER_IMPL_AVX512 _ZGVdN4v_atan +END (_ZGVeN8v_atan) diff --git a/sysdeps/x86_64/fpu/svml_s_atanf16_core.S b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S new file mode 100644 index 0000000000..f82d2422ae --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S @@ -0,0 +1,25 @@ +/* Function atanf vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_atanf) +WRAPPER_IMPL_AVX512 _ZGVdN8v_atanf +END (_ZGVeN16v_atanf) diff --git a/sysdeps/x86_64/fpu/svml_s_atanf4_core.S b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S new file mode 100644 index 0000000000..6b8c4d9624 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S @@ -0,0 +1,29 @@ +/* Function atanf vectorized with SSE2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_atanf) +WRAPPER_IMPL_SSE2 atanf +END (_ZGVbN4v_atanf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4v_atanf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S new file mode 100644 index 0000000000..315681f6c0 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S @@ -0,0 +1,29 @@ +/* Function atanf vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_atanf) +WRAPPER_IMPL_AVX _ZGVbN4v_atanf +END (_ZGVdN8v_atanf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8v_atanf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S new file mode 100644 index 0000000000..b9cd502186 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S @@ -0,0 +1,25 @@ +/* Function atanf vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVcN8v_atanf) +WRAPPER_IMPL_AVX _ZGVbN4v_atanf +END (_ZGVcN8v_atanf) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c new file mode 100644 index 0000000000..0f7176a20b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-atan.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c new file mode 100644 index 0000000000..0f7176a20b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-atan.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c new file mode 100644 index 0000000000..0f7176a20b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-atan.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c new file mode 100644 index 0000000000..982687b169 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC atan +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index 50b4d7c02f..7abe3211c8 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos) VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh) VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin) VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVbN2v_asinh) +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index 255df14db5..1537ed25cc 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos) VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh) VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin) VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVdN4v_asinh) +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index 355560d081..27bcc9c59a 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos) VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh) VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin) VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVcN4v_asinh) +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index b7ecf9c8f3..2333349893 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos) VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh) VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin) VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVeN8v_asinh) +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c new file mode 100644 index 0000000000..9251c65f8a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-atanf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c new file mode 100644 index 0000000000..9251c65f8a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-atanf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c new file mode 100644 index 0000000000..9251c65f8a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-atanf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c new file mode 100644 index 0000000000..2a8ab87e86 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC atanf +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index b1e47a4434..723651140e 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf) VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf) VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf) VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVeN16v_asinhf) +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index b2607f8467..da77149021 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf) VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf) VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf) VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVbN4v_asinhf) +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index 35e36c4510..a978f37e79 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf) VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf) VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf) VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVdN8v_asinhf) +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index b01e99f690..1ae9a8c3c0 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf) VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf) VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf) VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVcN8v_asinhf) +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf) #define VEC_INT_TYPE __m128i