From patchwork Wed Nov 24 19:37:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 48086 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BA62385801A for ; Wed, 24 Nov 2021 20:00:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4BA62385801A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637784022; bh=OStRjomILW9qw95BJqdtR63mt6v6UMVK+tTXtA1KyeE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HkwnD1DXRvQCad0Q8Mn7tElvuVuGTkJ3h/CjkIA/0TCprqEPYYtXU8r51WO0ClU77 QS524uQtpZjRMq5M8Q58GtaJQZiTEct4lNQVNQHQC7yydt6+2y4jKkI2+fVOj1FCGB hV7wMzFb3riSNe5Xdl22f2wifuT2aeO21EWeagJU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id 744B73858023 for ; Wed, 24 Nov 2021 19:38:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 744B73858023 X-IronPort-AV: E=McAfee;i="6200,9189,10178"; a="216066702" X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="216066702" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2021 11:38:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="674978903" Received: from scymds02.sc.intel.com ([10.82.73.244]) by orsmga005.jf.intel.com with ESMTP; 24 Nov 2021 11:38:11 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1AOJc7X4021555; Wed, 24 Nov 2021 11:38:10 -0800 To: libc-alpha@sourceware.org Subject: [PATCH 17/42] x86-64: Add vector cosh/coshf implementation to libmvec Date: Wed, 24 Nov 2021 11:37:42 -0800 Message-Id: <20211124193807.2093208-18-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211124193807.2093208-1-skpgkp2@gmail.com> References: <20211124193807.2093208-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, NML_ADSP_CUSTOM_MED, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cosh/coshf with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_cosh2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_cosh2_core.c | 27 + .../fpu/multiarch/svml_d_cosh2_core_sse4.S | 2491 +++++++++++++++++ .../fpu/multiarch/svml_d_cosh4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_cosh4_core.c | 27 + .../fpu/multiarch/svml_d_cosh4_core_avx2.S | 2250 +++++++++++++++ .../fpu/multiarch/svml_d_cosh8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_cosh8_core.c | 27 + .../fpu/multiarch/svml_d_cosh8_core_avx512.S | 1694 +++++++++++ .../fpu/multiarch/svml_s_coshf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_coshf16_core.c | 28 + .../multiarch/svml_s_coshf16_core_avx512.S | 1190 ++++++++ .../fpu/multiarch/svml_s_coshf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_coshf4_core.c | 28 + .../fpu/multiarch/svml_s_coshf4_core_sse4.S | 2104 ++++++++++++++ .../fpu/multiarch/svml_s_coshf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_coshf8_core.c | 28 + .../fpu/multiarch/svml_s_coshf8_core_avx2.S | 1803 ++++++++++++ sysdeps/x86_64/fpu/svml_d_cosh2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_cosh4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_cosh8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_coshf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_coshf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_coshf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-cosh-avx.c | 1 + .../fpu/test-double-libmvec-cosh-avx2.c | 1 + .../fpu/test-double-libmvec-cosh-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-cosh.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-coshf-avx.c | 1 + .../fpu/test-float-libmvec-coshf-avx2.c | 1 + .../fpu/test-float-libmvec-coshf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-coshf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 49 files changed, 12100 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index b282298fe3..f028b18620 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -186,4 +186,15 @@ #define __DECL_SIMD_cbrtf32x #define __DECL_SIMD_cbrtf64x #define __DECL_SIMD_cbrtf128x + +#define __DECL_SIMD_cosh +#define __DECL_SIMD_coshf +#define __DECL_SIMD_coshl +#define __DECL_SIMD_coshf16 +#define __DECL_SIMD_coshf32 +#define __DECL_SIMD_coshf64 +#define __DECL_SIMD_coshf128 +#define __DECL_SIMD_coshf32x +#define __DECL_SIMD_coshf64x +#define __DECL_SIMD_coshf128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 180e81e678..bc5a81b365 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -68,7 +68,7 @@ __MATHCALL (tan,, (_Mdouble_ __x)); /* Hyperbolic functions. */ /* Hyperbolic cosine of X. */ -__MATHCALL (cosh,, (_Mdouble_ __x)); +__MATHCALL_VEC (cosh,, (_Mdouble_ __x)); /* Hyperbolic sine of X. */ __MATHCALL (sinh,, (_Mdouble_ __x)); /* Hyperbolic tangent of X. */ diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index 1cf8e91ffb..4b5753146e 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -53,6 +53,7 @@ GLIBC_2.35 _ZGVbN2v_asinh F GLIBC_2.35 _ZGVbN2v_atan F GLIBC_2.35 _ZGVbN2v_atanh F GLIBC_2.35 _ZGVbN2v_cbrt F +GLIBC_2.35 _ZGVbN2v_cosh F GLIBC_2.35 _ZGVbN2vv_atan2 F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F @@ -61,6 +62,7 @@ GLIBC_2.35 _ZGVbN4v_asinhf F GLIBC_2.35 _ZGVbN4v_atanf F GLIBC_2.35 _ZGVbN4v_atanhf F GLIBC_2.35 _ZGVbN4v_cbrtf F +GLIBC_2.35 _ZGVbN4v_coshf F GLIBC_2.35 _ZGVbN4vv_atan2f F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F @@ -69,6 +71,7 @@ GLIBC_2.35 _ZGVcN4v_asinh F GLIBC_2.35 _ZGVcN4v_atan F GLIBC_2.35 _ZGVcN4v_atanh F GLIBC_2.35 _ZGVcN4v_cbrt F +GLIBC_2.35 _ZGVcN4v_cosh F GLIBC_2.35 _ZGVcN4vv_atan2 F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F @@ -77,6 +80,7 @@ GLIBC_2.35 _ZGVcN8v_asinhf F GLIBC_2.35 _ZGVcN8v_atanf F GLIBC_2.35 _ZGVcN8v_atanhf F GLIBC_2.35 _ZGVcN8v_cbrtf F +GLIBC_2.35 _ZGVcN8v_coshf F GLIBC_2.35 _ZGVcN8vv_atan2f F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F @@ -85,6 +89,7 @@ GLIBC_2.35 _ZGVdN4v_asinh F GLIBC_2.35 _ZGVdN4v_atan F GLIBC_2.35 _ZGVdN4v_atanh F GLIBC_2.35 _ZGVdN4v_cbrt F +GLIBC_2.35 _ZGVdN4v_cosh F GLIBC_2.35 _ZGVdN4vv_atan2 F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F @@ -93,6 +98,7 @@ GLIBC_2.35 _ZGVdN8v_asinhf F GLIBC_2.35 _ZGVdN8v_atanf F GLIBC_2.35 _ZGVdN8v_atanhf F GLIBC_2.35 _ZGVdN8v_cbrtf F +GLIBC_2.35 _ZGVdN8v_coshf F GLIBC_2.35 _ZGVdN8vv_atan2f F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F @@ -101,6 +107,7 @@ GLIBC_2.35 _ZGVeN16v_asinhf F GLIBC_2.35 _ZGVeN16v_atanf F GLIBC_2.35 _ZGVeN16v_atanhf F GLIBC_2.35 _ZGVeN16v_cbrtf F +GLIBC_2.35 _ZGVeN16v_coshf F GLIBC_2.35 _ZGVeN16vv_atan2f F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F @@ -109,4 +116,5 @@ GLIBC_2.35 _ZGVeN8v_asinh F GLIBC_2.35 _ZGVeN8v_atan F GLIBC_2.35 _ZGVeN8v_atanh F GLIBC_2.35 _ZGVeN8v_cbrt F +GLIBC_2.35 _ZGVeN8v_cosh F GLIBC_2.35 _ZGVeN8vv_atan2 F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index 5b0a2d9efe..a8465a23a7 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -90,6 +90,10 @@ # define __DECL_SIMD_cbrt __DECL_SIMD_x86_64 # undef __DECL_SIMD_cbrtf # define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64 +# undef __DECL_SIMD_cosh +# define __DECL_SIMD_cosh __DECL_SIMD_x86_64 +# undef __DECL_SIMD_coshf +# define __DECL_SIMD_coshf __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index 398a112dab..a0dc2914e6 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -31,6 +31,7 @@ libmvec-funcs = \ atanh \ cbrt \ cos \ + cosh \ exp \ log \ pow \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index 45baab6b6e..82b5723aca 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -21,6 +21,7 @@ libmvec { _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan; _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh; _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt; + _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh; _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; @@ -29,6 +30,7 @@ libmvec { _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf; _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf; _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf; + _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf; _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index 8b681ed441..d9296ac12e 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -988,6 +988,26 @@ float: 2 float128: 3 ldouble: 3 +Function: "cosh_vlen16": +float: 1 + +Function: "cosh_vlen2": +double: 1 + +Function: "cosh_vlen4": +double: 1 +float: 1 + +Function: "cosh_vlen4_avx2": +double: 1 + +Function: "cosh_vlen8": +double: 1 +float: 1 + +Function: "cosh_vlen8_avx2": +float: 1 + Function: Real part of "cpow": double: 2 float: 5 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S new file mode 100644 index 0000000000..bfe4e3d0f0 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized cosh, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2v_cosh _ZGVbN2v_cosh_sse2 +#include "../svml_d_cosh2_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c new file mode 100644 index 0000000000..99561fea47 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cosh, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2v_cosh +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_cosh, __GI__ZGVbN2v_cosh, __redirect__ZGVbN2v_cosh) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S new file mode 100644 index 0000000000..b998007e68 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S @@ -0,0 +1,2491 @@ +/* Function cosh vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVbN2v_cosh_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm5 + movups 3648+__svml_dcosh_data_internal(%rip), %xmm1 + lea 896+__svml_dcosh_data_internal(%rip), %rax + +/* Abs argument */ + movaps %xmm1, %xmm4 + +/* dXSign=0x001000000000 */ + psrlq $11, %xmm1 + +/* + * Load argument + * dM = x*2^K/log(2) + RShifter + */ + movups 3008+__svml_dcosh_data_internal(%rip), %xmm2 + andnps %xmm5, %xmm4 + mulpd %xmm4, %xmm2 + movups 3200+__svml_dcosh_data_internal(%rip), %xmm3 + addpd %xmm3, %xmm2 + movq 3712+__svml_dcosh_data_internal(%rip), %xmm0 + +/* + * Check for overflow\underflow + * + */ + pshufd $221, %xmm4, %xmm7 + movq 3264+__svml_dcosh_data_internal(%rip), %xmm6 + pcmpgtd %xmm0, %xmm7 + +/* Index and lookup */ + pshufd $136, %xmm2, %xmm0 + movmskps %xmm7, %edx + pand %xmm6, %xmm0 + +/* + * trick + * 256=-iIndex + */ + movq 3520+__svml_dcosh_data_internal(%rip), %xmm7 + +/* iIndex*=3 */ + movdqa %xmm0, %xmm6 + psubd %xmm0, %xmm7 + pslld $3, %xmm6 + +/* iDomainRange*=3 */ + pslld $3, %xmm7 + movd %xmm7, %r9d + pshufd $1, %xmm7, %xmm7 + movd %xmm7, %r10d + +/* + * R + * dN = dM - RShifter + */ + movaps %xmm2, %xmm7 + movd %xmm6, %ecx + +/* + * G1,G2,G3: dTdif,dTn * 2^N,2^(-N) + * NB: copied from sinh_la - to be optimized!!!!! + */ + psllq $44, %xmm2 + pshufd $1, %xmm6, %xmm6 + +/* lM now is an EXP(2^N) */ + pand 3584+__svml_dcosh_data_internal(%rip), %xmm2 + subpd %xmm3, %xmm7 + movd %xmm6, %r8d + +/* dR = dX - dN*Log2_hi/2^K */ + movups 3072+__svml_dcosh_data_internal(%rip), %xmm3 + mulpd %xmm7, %xmm3 + subpd %xmm3, %xmm4 + +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */ + movups 3136+__svml_dcosh_data_internal(%rip), %xmm3 + mulpd %xmm7, %xmm3 + subpd %xmm3, %xmm4 + +/* dR2 = dR^2 */ + movaps %xmm4, %xmm3 + mulpd %xmm4, %xmm3 + movslq %ecx, %rcx + movslq %r9d, %r9 + movslq %r8d, %r8 + movslq %r10d, %r10 + movsd (%rax,%rcx), %xmm0 + movsd (%rax,%r9), %xmm6 + movhpd (%rax,%r8), %xmm0 + movhpd (%rax,%r10), %xmm6 + paddq %xmm2, %xmm0 + +/* */ + psubq %xmm2, %xmm6 + +/* + * sinh(r) = r +r*r^2*a3 .... + * dSinh_r = r^2*a3 + */ + movups 3392+__svml_dcosh_data_internal(%rip), %xmm2 + mulpd %xmm3, %xmm2 + +/* lX- = EXP(1/2) */ + psubq %xmm1, %xmm6 + +/* dSinh_r = r + r*r^2*a3 */ + mulpd %xmm4, %xmm2 + +/* dTn = dTn*2^N - dTn*2^-N */ + movaps %xmm0, %xmm1 + subpd %xmm6, %xmm1 + addpd %xmm2, %xmm4 + +/* dTp = dTn*2^N + dTn*2^-N */ + addpd %xmm6, %xmm0 + mulpd %xmm4, %xmm1 + +/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + movups 3456+__svml_dcosh_data_internal(%rip), %xmm4 + mulpd %xmm3, %xmm4 + addpd 3328+__svml_dcosh_data_internal(%rip), %xmm4 + mulpd %xmm4, %xmm3 + +/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + mulpd %xmm0, %xmm3 + addpd %xmm3, %xmm1 + +/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + addpd %xmm1, %xmm0 + andl $3, %edx + +/* Ret H */ + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm5, 192(%rsp) + movups %xmm0, 256(%rsp) + je .LBL_1_2 + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $2, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,8), %rdi + lea 256(%rsp,%r12,8), %rsi + call __svml_dcosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN2v_cosh_sse4) + + .align 16,0x90 + +__svml_dcosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %edx + xorl %eax, %eax + andl $32752, %edx + cmpl $32752, %edx + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -8(%rsp) + shrq $56, %rdx + andl $127, %edx + movb %dl, -1(%rsp) + movzwl -2(%rsp), %ecx + andl $32752, %ecx + cmpl $15504, %ecx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__dcosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__dcosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__dcosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__dcosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__dcosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + andl $63, %r10d + addl $1022, %r9d + movsd -32(%rsp), %xmm3 + andl $2047, %r9d + negl %edi + shll $4, %r9d + addl $-4, %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + lea (%r10,%r10), %edx + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + movsd -32(%rsp), %xmm4 + orl %r9d, %esi + andl $2047, %edi + lea 1(%r10,%r10), %r11d + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + subsd %xmm4, %xmm0 + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + mulsd %xmm0, %xmm5 + movsd -8(%rsp), %xmm7 + movaps %xmm15, %xmm8 + movsd (%rcx,%rsi,8), %xmm11 + addl $129, %edx + mulsd %xmm7, %xmm11 + movaps %xmm15, %xmm10 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm14 + movsd %xmm8, -24(%rsp) + movsd (%rcx,%r11,8), %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + movsd -24(%rsp), %xmm9 + mulsd %xmm6, %xmm12 + subsd %xmm9, %xmm10 + mulsd %xmm7, %xmm13 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm10 + mulsd %xmm0, %xmm14 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movaps %xmm12, %xmm11 + mulsd %xmm5, %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm15, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm15, %xmm0 + mulsd %xmm5, %xmm1 + addsd %xmm12, %xmm11 + movsd %xmm10, -16(%rsp) + addsd %xmm13, %xmm11 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + addsd %xmm14, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm0, %xmm11 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm1, %xmm11 + addsd %xmm2, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + movsd %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__dcosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__dcosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_12: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_dcosh_cout_rare_internal,@function + .size __svml_dcosh_cout_rare_internal,.-__svml_dcosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dcosh_data_internal: + .long 0 + .long 1071644672 + .long 1828292879 + .long 1071691096 + .long 1014845819 + .long 1071739576 + .long 1853186616 + .long 1071790202 + .long 171030293 + .long 1071843070 + .long 1276261410 + .long 1071898278 + .long 3577096743 + .long 1071955930 + .long 3712504873 + .long 1072016135 + .long 1719614413 + .long 1072079006 + .long 1944781191 + .long 1072144660 + .long 1110089947 + .long 1072213221 + .long 2191782032 + .long 1072284817 + .long 2572866477 + .long 1072359583 + .long 3716502172 + .long 1072437659 + .long 3707479175 + .long 1072519192 + .long 2728693978 + .long 1072604335 + .long 0 + .long 0 + .long 1255956747 + .long 1015588398 + .long 3117910646 + .long 3161559105 + .long 3066496371 + .long 1015656574 + .long 3526460132 + .long 1014428778 + .long 300981948 + .long 1014684169 + .long 2951496418 + .long 1013793687 + .long 88491949 + .long 1015427660 + .long 330458198 + .long 3163282740 + .long 3993278767 + .long 3161724279 + .long 1451641639 + .long 1015474673 + .long 2960257726 + .long 1013742662 + .long 878562433 + .long 1015521741 + .long 2303740125 + .long 1014042725 + .long 3613079303 + .long 1014164738 + .long 396109971 + .long 3163462691 + .long 0 + .long 1071644672 + .long 2728693978 + .long 1071555759 + .long 3707479175 + .long 1071470616 + .long 3716502172 + .long 1071389083 + .long 2572866477 + .long 1071311007 + .long 2191782032 + .long 1071236241 + .long 1110089947 + .long 1071164645 + .long 1944781191 + .long 1071096084 + .long 1719614413 + .long 1071030430 + .long 3712504873 + .long 1070967559 + .long 3577096743 + .long 1070907354 + .long 1276261410 + .long 1070849702 + .long 171030293 + .long 1070794494 + .long 1853186616 + .long 1070741626 + .long 1014845819 + .long 1070691000 + .long 1828292879 + .long 1070642520 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 15 + .long 0 + .long 15 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071644672 + .long 4200250559 + .long 1071647514 + .long 2851812149 + .long 1071650365 + .long 339411585 + .long 1071653224 + .long 1048019041 + .long 1071656090 + .long 772914124 + .long 1071658964 + .long 3899555717 + .long 1071661845 + .long 1928746161 + .long 1071664735 + .long 3541402996 + .long 1071667632 + .long 238821257 + .long 1071670538 + .long 702412510 + .long 1071673451 + .long 728934454 + .long 1071676372 + .long 410360776 + .long 1071679301 + .long 4133881824 + .long 1071682237 + .long 3402036099 + .long 1071685182 + .long 2602514713 + .long 1071688135 + .long 1828292879 + .long 1071691096 + .long 1172597893 + .long 1071694065 + .long 728909815 + .long 1071697042 + .long 590962156 + .long 1071700027 + .long 852742562 + .long 1071703020 + .long 1608493509 + .long 1071706021 + .long 2952712987 + .long 1071709030 + .long 685187902 + .long 1071712048 + .long 3490863953 + .long 1071715073 + .long 2875075254 + .long 1071718107 + .long 3228316108 + .long 1071721149 + .long 351405227 + .long 1071724200 + .long 2930322912 + .long 1071727258 + .long 2471440686 + .long 1071730325 + .long 3366293073 + .long 1071733400 + .long 1416741826 + .long 1071736484 + .long 1014845819 + .long 1071739576 + .long 2257959872 + .long 1071742676 + .long 948735466 + .long 1071745785 + .long 1480023343 + .long 1071748902 + .long 3949972341 + .long 1071752027 + .long 4162030108 + .long 1071755161 + .long 2214878420 + .long 1071758304 + .long 2502433899 + .long 1071761455 + .long 828946858 + .long 1071764615 + .long 1588871207 + .long 1071767783 + .long 586995997 + .long 1071770960 + .long 2218315341 + .long 1071774145 + .long 2288159958 + .long 1071777339 + .long 897099801 + .long 1071780542 + .long 2440944790 + .long 1071783753 + .long 2725843665 + .long 1071786973 + .long 1853186616 + .long 1071790202 + .long 4219606026 + .long 1071793439 + .long 1337108031 + .long 1071796686 + .long 1897844341 + .long 1071799941 + .long 1709341917 + .long 1071803205 + .long 874372905 + .long 1071806478 + .long 3790955393 + .long 1071809759 + .long 1972484976 + .long 1071813050 + .long 4112506593 + .long 1071816349 + .long 1724976915 + .long 1071819658 + .long 3504003472 + .long 1071822975 + .long 964107055 + .long 1071826302 + .long 2799960843 + .long 1071829637 + .long 526652809 + .long 1071832982 + .long 2839424854 + .long 1071836335 + .long 1253935211 + .long 1071839698 + .long 171030293 + .long 1071843070 + .long 3991843581 + .long 1071846450 + .long 4232894513 + .long 1071849840 + .long 1000925746 + .long 1071853240 + .long 2992903935 + .long 1071856648 + .long 1726216749 + .long 1071860066 + .long 1603444721 + .long 1071863493 + .long 2732492859 + .long 1071866929 + .long 926591435 + .long 1071870375 + .long 589198666 + .long 1071873830 + .long 1829099622 + .long 1071877294 + .long 460407023 + .long 1071880768 + .long 887463927 + .long 1071884251 + .long 3219942644 + .long 1071887743 + .long 3272845541 + .long 1071891245 + .long 1156440435 + .long 1071894757 + .long 1276261410 + .long 1071898278 + .long 3743175029 + .long 1071901808 + .long 78413852 + .long 1071905349 + .long 3278348324 + .long 1071908898 + .long 569847338 + .long 1071912458 + .long 654919306 + .long 1071916027 + .long 3645941911 + .long 1071919605 + .long 1065662932 + .long 1071923194 + .long 1617004845 + .long 1071926792 + .long 1118294578 + .long 1071930400 + .long 3978100823 + .long 1071934017 + .long 1720398391 + .long 1071937645 + .long 3049340112 + .long 1071941282 + .long 3784486610 + .long 1071944929 + .long 4040676318 + .long 1071948586 + .long 3933059031 + .long 1071952253 + .long 3577096743 + .long 1071955930 + .long 3088564500 + .long 1071959617 + .long 2583551245 + .long 1071963314 + .long 2178460671 + .long 1071967021 + .long 1990012071 + .long 1071970738 + .long 2135241198 + .long 1071974465 + .long 2731501122 + .long 1071978202 + .long 3896463087 + .long 1071981949 + .long 1453150082 + .long 1071985707 + .long 4109806887 + .long 1071989474 + .long 3395129871 + .long 1071993252 + .long 3723038930 + .long 1071997040 + .long 917841882 + .long 1072000839 + .long 3689071823 + .long 1072004647 + .long 3566716925 + .long 1072008466 + .long 671025100 + .long 1072012296 + .long 3712504873 + .long 1072016135 + .long 4222122499 + .long 1072019985 + .long 2321106615 + .long 1072023846 + .long 2425981843 + .long 1072027717 + .long 363667784 + .long 1072031599 + .long 551349105 + .long 1072035491 + .long 3111574537 + .long 1072039393 + .long 3872257780 + .long 1072043306 + .long 2956612997 + .long 1072047230 + .long 488188413 + .long 1072051165 + .long 885834528 + .long 1072055110 + .long 4273770423 + .long 1072059065 + .long 2186617381 + .long 1072063032 + .long 3339203574 + .long 1072067009 + .long 3561793907 + .long 1072070997 + .long 2979960120 + .long 1072074996 + .long 1719614413 + .long 1072079006 + .long 4201977662 + .long 1072083026 + .long 1963711167 + .long 1072087058 + .long 3721688645 + .long 1072091100 + .long 1013258799 + .long 1072095154 + .long 2555984613 + .long 1072099218 + .long 4182873220 + .long 1072103293 + .long 1727278727 + .long 1072107380 + .long 3907805044 + .long 1072111477 + .long 2263535754 + .long 1072115586 + .long 1218806132 + .long 1072119706 + .long 903334909 + .long 1072123837 + .long 1447192521 + .long 1072127979 + .long 2980802057 + .long 1072132132 + .long 1339972927 + .long 1072136297 + .long 950803702 + .long 1072140473 + .long 1944781191 + .long 1072144660 + .long 158781403 + .long 1072148859 + .long 19972402 + .long 1072153069 + .long 1660913392 + .long 1072157290 + .long 919555682 + .long 1072161523 + .long 2224145553 + .long 1072165767 + .long 1413356050 + .long 1072170023 + .long 2916157145 + .long 1072174290 + .long 2571947539 + .long 1072178569 + .long 515457527 + .long 1072182860 + .long 1176749997 + .long 1072187162 + .long 396319521 + .long 1072191476 + .long 2604962541 + .long 1072195801 + .long 3643909174 + .long 1072200138 + .long 3649726105 + .long 1072204487 + .long 2759350287 + .long 1072208848 + .long 1110089947 + .long 1072213221 + .long 3134592888 + .long 1072217605 + .long 380978316 + .long 1072222002 + .long 1577608921 + .long 1072226410 + .long 2568320822 + .long 1072230830 + .long 3492293770 + .long 1072235262 + .long 194117574 + .long 1072239707 + .long 1403662306 + .long 1072244163 + .long 2966275557 + .long 1072248631 + .long 727685349 + .long 1072253112 + .long 3418903055 + .long 1072257604 + .long 2591453363 + .long 1072262109 + .long 2682146384 + .long 1072266626 + .long 3833209506 + .long 1072271155 + .long 1892288442 + .long 1072275697 + .long 1297350157 + .long 1072280251 + .long 2191782032 + .long 1072284817 + .long 424392917 + .long 1072289396 + .long 434316067 + .long 1072293987 + .long 2366108318 + .long 1072298590 + .long 2069751141 + .long 1072303206 + .long 3985553595 + .long 1072307834 + .long 3964284211 + .long 1072312475 + .long 2152073944 + .long 1072317129 + .long 2990417245 + .long 1072321795 + .long 2331271250 + .long 1072326474 + .long 321958744 + .long 1072331166 + .long 1405169241 + .long 1072335870 + .long 1434058175 + .long 1072340587 + .long 557149882 + .long 1072345317 + .long 3218338682 + .long 1072350059 + .long 977020788 + .long 1072354815 + .long 2572866477 + .long 1072359583 + .long 3861050111 + .long 1072364364 + .long 697153126 + .long 1072369159 + .long 1822067026 + .long 1072373966 + .long 3092190715 + .long 1072378786 + .long 364333489 + .long 1072383620 + .long 2380618042 + .long 1072388466 + .long 703710506 + .long 1072393326 + .long 4076559943 + .long 1072398198 + .long 4062661092 + .long 1072403084 + .long 815859274 + .long 1072407984 + .long 3080351519 + .long 1072412896 + .long 2420883922 + .long 1072417822 + .long 3287523847 + .long 1072422761 + .long 1540824585 + .long 1072427714 + .long 1631695677 + .long 1072432680 + .long 3716502172 + .long 1072437659 + .long 3657065772 + .long 1072442652 + .long 1610600570 + .long 1072447659 + .long 2029714210 + .long 1072452679 + .long 777507147 + .long 1072457713 + .long 2307442995 + .long 1072462760 + .long 2483480501 + .long 1072467821 + .long 1464976603 + .long 1072472896 + .long 3706687593 + .long 1072477984 + .long 778901109 + .long 1072483087 + .long 1432208378 + .long 1072488203 + .long 1532734324 + .long 1072493333 + .long 1242007932 + .long 1072498477 + .long 721996136 + .long 1072503635 + .long 135105010 + .long 1072508807 + .long 3939148246 + .long 1072513992 + .long 3707479175 + .long 1072519192 + .long 3898795731 + .long 1072524406 + .long 382305176 + .long 1072529635 + .long 1912561781 + .long 1072534877 + .long 64696965 + .long 1072540134 + .long 3594158869 + .long 1072545404 + .long 4076975200 + .long 1072550689 + .long 1679558232 + .long 1072555989 + .long 863738719 + .long 1072561303 + .long 1796832535 + .long 1072566631 + .long 351641897 + .long 1072571974 + .long 991358482 + .long 1072577331 + .long 3884662774 + .long 1072582702 + .long 610758006 + .long 1072588089 + .long 4224142467 + .long 1072593489 + .long 2009970496 + .long 1072598905 + .long 2728693978 + .long 1072604335 + .long 2256325230 + .long 1072609780 + .long 764307441 + .long 1072615240 + .long 2719515920 + .long 1072620714 + .long 3999357479 + .long 1072626203 + .long 481706282 + .long 1072631708 + .long 929806999 + .long 1072637227 + .long 1222472308 + .long 1072642761 + .long 1533953344 + .long 1072648310 + .long 2038973688 + .long 1072653874 + .long 2912730644 + .long 1072659453 + .long 35929225 + .long 1072665048 + .long 2174652632 + .long 1072670657 + .long 915592468 + .long 1072676282 + .long 730821105 + .long 1072681922 + .long 1797923801 + .long 1072687577 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 255 + .long 255 + .long 255 + .long 255 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 256 + .long 256 + .long 256 + .long 256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dcosh_data_internal,@object + .size __svml_dcosh_data_internal,3776 + .space 832, 0x00 + .align 32 + +__dcosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2411329662 + .long 1082536910 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __dcosh_la_CoutTab,@object + .size __dcosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S new file mode 100644 index 0000000000..4410d34583 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized cosh, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4v_cosh _ZGVdN4v_cosh_sse_wrapper +#include "../svml_d_cosh4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c new file mode 100644 index 0000000000..c4f59206a9 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cosh, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4v_cosh +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_cosh, __GI__ZGVdN4v_cosh, __redirect__ZGVdN4v_cosh) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S new file mode 100644 index 0000000000..f511255a04 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S @@ -0,0 +1,2250 @@ +/* Function cosh vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVdN4v_cosh_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + lea 896+__svml_dcosh_data_internal(%rip), %rax + vmovupd 3648+__svml_dcosh_data_internal(%rip), %ymm5 + vmovupd 3200+__svml_dcosh_data_internal(%rip), %ymm6 + +/* + * Load argument + * dM = x*2^K/log(2) + RShifter + */ + vmovupd 3008+__svml_dcosh_data_internal(%rip), %ymm3 + vmovups %ymm9, 96(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm11, 192(%rsp) + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + +/* + * trick + * 256=-iIndex + */ + vmovups 3520+__svml_dcosh_data_internal(%rip), %xmm11 + vmovups %ymm13, 256(%rsp) + vmovups %ymm8, 32(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm15, 320(%rsp) + vmovapd %ymm0, %ymm7 + +/* Abs argument */ + vandnpd %ymm7, %ymm5, %ymm4 + vfmadd213pd %ymm6, %ymm4, %ymm3 + +/* dXSign=0x001000000000 */ + vpsrlq $11, %ymm5, %ymm5 + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +/* Index and lookup */ + vextractf128 $1, %ymm3, %xmm9 + +/* + * R + * dN = dM - RShifter + */ + vsubpd %ymm6, %ymm3, %ymm6 + vshufps $136, %xmm9, %xmm3, %xmm10 + vpand 3264+__svml_dcosh_data_internal(%rip), %xmm10, %xmm12 + +/* iIndex*=3 */ + vpslld $3, %xmm12, %xmm0 + vpsubd %xmm12, %xmm11, %xmm13 + vmovd %xmm0, %ecx + +/* + * G1,G2,G3: dTdif,dTn * 2^N,2^(-N) + * NB: copied from sinh_la - to be optimized!!!!! + */ + vpsllq $44, %ymm3, %ymm3 + movslq %ecx, %rcx + vpextrd $1, %xmm0, %r8d + movslq %r8d, %r8 + vpextrd $2, %xmm0, %r9d + movslq %r9d, %r9 + vmovsd (%rax,%rcx), %xmm14 + vmovhpd (%rax,%r8), %xmm14, %xmm15 + vpextrd $3, %xmm0, %r10d + movslq %r10d, %r10 + +/* + * Check for overflow\underflow + * + */ + vextractf128 $1, %ymm4, %xmm1 + vshufps $221, %xmm1, %xmm4, %xmm2 + +/* dR = dX - dN*Log2_hi/2^K */ + vfnmadd231pd 3072+__svml_dcosh_data_internal(%rip), %ymm6, %ymm4 + vpcmpgtd 3712+__svml_dcosh_data_internal(%rip), %xmm2, %xmm8 + +/* iDomainRange*=3 */ + vpslld $3, %xmm13, %xmm2 + vmovsd (%rax,%r9), %xmm1 + vmovd %xmm2, %r11d + vmovmskps %xmm8, %edx + vmovhpd (%rax,%r10), %xmm1, %xmm8 + +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */ + vfnmadd231pd 3136+__svml_dcosh_data_internal(%rip), %ymm6, %ymm4 + vpextrd $2, %xmm2, %r8d + movslq %r11d, %r11 + vpextrd $1, %xmm2, %ecx + movslq %r8d, %r8 + vpextrd $3, %xmm2, %r9d + movslq %ecx, %rcx + movslq %r9d, %r9 + +/* dR2 = dR^2 */ + vmulpd %ymm4, %ymm4, %ymm0 + vmovsd (%rax,%r11), %xmm9 + vmovsd (%rax,%r8), %xmm11 + vmovhpd (%rax,%rcx), %xmm9, %xmm10 + vmovhpd (%rax,%r9), %xmm11, %xmm2 + +/* + * sinh(r) = r +r*r^2*a3 .... + * dSinh_r = r^2*a3 + */ + vmulpd 3392+__svml_dcosh_data_internal(%rip), %ymm0, %ymm9 + +/* dSinh_r = r + r*r^2*a3 */ + vfmadd213pd %ymm4, %ymm4, %ymm9 + +/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + vmovupd 3456+__svml_dcosh_data_internal(%rip), %ymm4 + vfmadd213pd 3328+__svml_dcosh_data_internal(%rip), %ymm0, %ymm4 + vmulpd %ymm4, %ymm0, %ymm0 + vinsertf128 $1, %xmm2, %ymm10, %ymm12 + +/* lM now is an EXP(2^N) */ + vpand 3584+__svml_dcosh_data_internal(%rip), %ymm3, %ymm2 + vinsertf128 $1, %xmm8, %ymm15, %ymm1 + vpaddq %ymm2, %ymm1, %ymm6 + +/* */ + vpsubq %ymm2, %ymm12, %ymm1 + +/* lX- = EXP(1/2) */ + vpsubq %ymm5, %ymm1, %ymm5 + +/* dTn = dTn*2^N - dTn*2^-N */ + vsubpd %ymm5, %ymm6, %ymm8 + +/* dTp = dTn*2^N + dTn*2^-N */ + vaddpd %ymm5, %ymm6, %ymm11 + vmulpd %ymm9, %ymm8, %ymm10 + +/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + vfmadd213pd %ymm10, %ymm11, %ymm0 + +/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */ + vaddpd %ymm0, %ymm11, %ymm0 + +/* Ret H */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovupd %ymm7, 64(%rsp) + vmovupd %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovupd 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dcosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN4v_cosh_avx2) + + .align 16,0x90 + +__svml_dcosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %edx + xorl %eax, %eax + andl $32752, %edx + cmpl $32752, %edx + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -8(%rsp) + shrq $56, %rdx + andl $127, %edx + movb %dl, -1(%rsp) + movzwl -2(%rsp), %ecx + andl $32752, %ecx + cmpl $15504, %ecx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__dcosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__dcosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__dcosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__dcosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__dcosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + andl $63, %r10d + addl $1022, %r9d + movsd -32(%rsp), %xmm3 + andl $2047, %r9d + negl %edi + shll $4, %r9d + addl $-4, %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + lea (%r10,%r10), %edx + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + movsd -32(%rsp), %xmm4 + orl %r9d, %esi + andl $2047, %edi + lea 1(%r10,%r10), %r11d + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + subsd %xmm4, %xmm0 + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + mulsd %xmm0, %xmm5 + movsd -8(%rsp), %xmm7 + movaps %xmm15, %xmm8 + movsd (%rcx,%rsi,8), %xmm11 + addl $129, %edx + mulsd %xmm7, %xmm11 + movaps %xmm15, %xmm10 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm14 + movsd %xmm8, -24(%rsp) + movsd (%rcx,%r11,8), %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + movsd -24(%rsp), %xmm9 + mulsd %xmm6, %xmm12 + subsd %xmm9, %xmm10 + mulsd %xmm7, %xmm13 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm10 + mulsd %xmm0, %xmm14 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movaps %xmm12, %xmm11 + mulsd %xmm5, %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm15, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm15, %xmm0 + mulsd %xmm5, %xmm1 + addsd %xmm12, %xmm11 + movsd %xmm10, -16(%rsp) + addsd %xmm13, %xmm11 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + addsd %xmm14, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm0, %xmm11 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm1, %xmm11 + addsd %xmm2, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + movsd %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__dcosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__dcosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_12: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_dcosh_cout_rare_internal,@function + .size __svml_dcosh_cout_rare_internal,.-__svml_dcosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dcosh_data_internal: + .long 0 + .long 1071644672 + .long 1828292879 + .long 1071691096 + .long 1014845819 + .long 1071739576 + .long 1853186616 + .long 1071790202 + .long 171030293 + .long 1071843070 + .long 1276261410 + .long 1071898278 + .long 3577096743 + .long 1071955930 + .long 3712504873 + .long 1072016135 + .long 1719614413 + .long 1072079006 + .long 1944781191 + .long 1072144660 + .long 1110089947 + .long 1072213221 + .long 2191782032 + .long 1072284817 + .long 2572866477 + .long 1072359583 + .long 3716502172 + .long 1072437659 + .long 3707479175 + .long 1072519192 + .long 2728693978 + .long 1072604335 + .long 0 + .long 0 + .long 1255956747 + .long 1015588398 + .long 3117910646 + .long 3161559105 + .long 3066496371 + .long 1015656574 + .long 3526460132 + .long 1014428778 + .long 300981948 + .long 1014684169 + .long 2951496418 + .long 1013793687 + .long 88491949 + .long 1015427660 + .long 330458198 + .long 3163282740 + .long 3993278767 + .long 3161724279 + .long 1451641639 + .long 1015474673 + .long 2960257726 + .long 1013742662 + .long 878562433 + .long 1015521741 + .long 2303740125 + .long 1014042725 + .long 3613079303 + .long 1014164738 + .long 396109971 + .long 3163462691 + .long 0 + .long 1071644672 + .long 2728693978 + .long 1071555759 + .long 3707479175 + .long 1071470616 + .long 3716502172 + .long 1071389083 + .long 2572866477 + .long 1071311007 + .long 2191782032 + .long 1071236241 + .long 1110089947 + .long 1071164645 + .long 1944781191 + .long 1071096084 + .long 1719614413 + .long 1071030430 + .long 3712504873 + .long 1070967559 + .long 3577096743 + .long 1070907354 + .long 1276261410 + .long 1070849702 + .long 171030293 + .long 1070794494 + .long 1853186616 + .long 1070741626 + .long 1014845819 + .long 1070691000 + .long 1828292879 + .long 1070642520 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071644672 + .long 4200250559 + .long 1071647514 + .long 2851812149 + .long 1071650365 + .long 339411585 + .long 1071653224 + .long 1048019041 + .long 1071656090 + .long 772914124 + .long 1071658964 + .long 3899555717 + .long 1071661845 + .long 1928746161 + .long 1071664735 + .long 3541402996 + .long 1071667632 + .long 238821257 + .long 1071670538 + .long 702412510 + .long 1071673451 + .long 728934454 + .long 1071676372 + .long 410360776 + .long 1071679301 + .long 4133881824 + .long 1071682237 + .long 3402036099 + .long 1071685182 + .long 2602514713 + .long 1071688135 + .long 1828292879 + .long 1071691096 + .long 1172597893 + .long 1071694065 + .long 728909815 + .long 1071697042 + .long 590962156 + .long 1071700027 + .long 852742562 + .long 1071703020 + .long 1608493509 + .long 1071706021 + .long 2952712987 + .long 1071709030 + .long 685187902 + .long 1071712048 + .long 3490863953 + .long 1071715073 + .long 2875075254 + .long 1071718107 + .long 3228316108 + .long 1071721149 + .long 351405227 + .long 1071724200 + .long 2930322912 + .long 1071727258 + .long 2471440686 + .long 1071730325 + .long 3366293073 + .long 1071733400 + .long 1416741826 + .long 1071736484 + .long 1014845819 + .long 1071739576 + .long 2257959872 + .long 1071742676 + .long 948735466 + .long 1071745785 + .long 1480023343 + .long 1071748902 + .long 3949972341 + .long 1071752027 + .long 4162030108 + .long 1071755161 + .long 2214878420 + .long 1071758304 + .long 2502433899 + .long 1071761455 + .long 828946858 + .long 1071764615 + .long 1588871207 + .long 1071767783 + .long 586995997 + .long 1071770960 + .long 2218315341 + .long 1071774145 + .long 2288159958 + .long 1071777339 + .long 897099801 + .long 1071780542 + .long 2440944790 + .long 1071783753 + .long 2725843665 + .long 1071786973 + .long 1853186616 + .long 1071790202 + .long 4219606026 + .long 1071793439 + .long 1337108031 + .long 1071796686 + .long 1897844341 + .long 1071799941 + .long 1709341917 + .long 1071803205 + .long 874372905 + .long 1071806478 + .long 3790955393 + .long 1071809759 + .long 1972484976 + .long 1071813050 + .long 4112506593 + .long 1071816349 + .long 1724976915 + .long 1071819658 + .long 3504003472 + .long 1071822975 + .long 964107055 + .long 1071826302 + .long 2799960843 + .long 1071829637 + .long 526652809 + .long 1071832982 + .long 2839424854 + .long 1071836335 + .long 1253935211 + .long 1071839698 + .long 171030293 + .long 1071843070 + .long 3991843581 + .long 1071846450 + .long 4232894513 + .long 1071849840 + .long 1000925746 + .long 1071853240 + .long 2992903935 + .long 1071856648 + .long 1726216749 + .long 1071860066 + .long 1603444721 + .long 1071863493 + .long 2732492859 + .long 1071866929 + .long 926591435 + .long 1071870375 + .long 589198666 + .long 1071873830 + .long 1829099622 + .long 1071877294 + .long 460407023 + .long 1071880768 + .long 887463927 + .long 1071884251 + .long 3219942644 + .long 1071887743 + .long 3272845541 + .long 1071891245 + .long 1156440435 + .long 1071894757 + .long 1276261410 + .long 1071898278 + .long 3743175029 + .long 1071901808 + .long 78413852 + .long 1071905349 + .long 3278348324 + .long 1071908898 + .long 569847338 + .long 1071912458 + .long 654919306 + .long 1071916027 + .long 3645941911 + .long 1071919605 + .long 1065662932 + .long 1071923194 + .long 1617004845 + .long 1071926792 + .long 1118294578 + .long 1071930400 + .long 3978100823 + .long 1071934017 + .long 1720398391 + .long 1071937645 + .long 3049340112 + .long 1071941282 + .long 3784486610 + .long 1071944929 + .long 4040676318 + .long 1071948586 + .long 3933059031 + .long 1071952253 + .long 3577096743 + .long 1071955930 + .long 3088564500 + .long 1071959617 + .long 2583551245 + .long 1071963314 + .long 2178460671 + .long 1071967021 + .long 1990012071 + .long 1071970738 + .long 2135241198 + .long 1071974465 + .long 2731501122 + .long 1071978202 + .long 3896463087 + .long 1071981949 + .long 1453150082 + .long 1071985707 + .long 4109806887 + .long 1071989474 + .long 3395129871 + .long 1071993252 + .long 3723038930 + .long 1071997040 + .long 917841882 + .long 1072000839 + .long 3689071823 + .long 1072004647 + .long 3566716925 + .long 1072008466 + .long 671025100 + .long 1072012296 + .long 3712504873 + .long 1072016135 + .long 4222122499 + .long 1072019985 + .long 2321106615 + .long 1072023846 + .long 2425981843 + .long 1072027717 + .long 363667784 + .long 1072031599 + .long 551349105 + .long 1072035491 + .long 3111574537 + .long 1072039393 + .long 3872257780 + .long 1072043306 + .long 2956612997 + .long 1072047230 + .long 488188413 + .long 1072051165 + .long 885834528 + .long 1072055110 + .long 4273770423 + .long 1072059065 + .long 2186617381 + .long 1072063032 + .long 3339203574 + .long 1072067009 + .long 3561793907 + .long 1072070997 + .long 2979960120 + .long 1072074996 + .long 1719614413 + .long 1072079006 + .long 4201977662 + .long 1072083026 + .long 1963711167 + .long 1072087058 + .long 3721688645 + .long 1072091100 + .long 1013258799 + .long 1072095154 + .long 2555984613 + .long 1072099218 + .long 4182873220 + .long 1072103293 + .long 1727278727 + .long 1072107380 + .long 3907805044 + .long 1072111477 + .long 2263535754 + .long 1072115586 + .long 1218806132 + .long 1072119706 + .long 903334909 + .long 1072123837 + .long 1447192521 + .long 1072127979 + .long 2980802057 + .long 1072132132 + .long 1339972927 + .long 1072136297 + .long 950803702 + .long 1072140473 + .long 1944781191 + .long 1072144660 + .long 158781403 + .long 1072148859 + .long 19972402 + .long 1072153069 + .long 1660913392 + .long 1072157290 + .long 919555682 + .long 1072161523 + .long 2224145553 + .long 1072165767 + .long 1413356050 + .long 1072170023 + .long 2916157145 + .long 1072174290 + .long 2571947539 + .long 1072178569 + .long 515457527 + .long 1072182860 + .long 1176749997 + .long 1072187162 + .long 396319521 + .long 1072191476 + .long 2604962541 + .long 1072195801 + .long 3643909174 + .long 1072200138 + .long 3649726105 + .long 1072204487 + .long 2759350287 + .long 1072208848 + .long 1110089947 + .long 1072213221 + .long 3134592888 + .long 1072217605 + .long 380978316 + .long 1072222002 + .long 1577608921 + .long 1072226410 + .long 2568320822 + .long 1072230830 + .long 3492293770 + .long 1072235262 + .long 194117574 + .long 1072239707 + .long 1403662306 + .long 1072244163 + .long 2966275557 + .long 1072248631 + .long 727685349 + .long 1072253112 + .long 3418903055 + .long 1072257604 + .long 2591453363 + .long 1072262109 + .long 2682146384 + .long 1072266626 + .long 3833209506 + .long 1072271155 + .long 1892288442 + .long 1072275697 + .long 1297350157 + .long 1072280251 + .long 2191782032 + .long 1072284817 + .long 424392917 + .long 1072289396 + .long 434316067 + .long 1072293987 + .long 2366108318 + .long 1072298590 + .long 2069751141 + .long 1072303206 + .long 3985553595 + .long 1072307834 + .long 3964284211 + .long 1072312475 + .long 2152073944 + .long 1072317129 + .long 2990417245 + .long 1072321795 + .long 2331271250 + .long 1072326474 + .long 321958744 + .long 1072331166 + .long 1405169241 + .long 1072335870 + .long 1434058175 + .long 1072340587 + .long 557149882 + .long 1072345317 + .long 3218338682 + .long 1072350059 + .long 977020788 + .long 1072354815 + .long 2572866477 + .long 1072359583 + .long 3861050111 + .long 1072364364 + .long 697153126 + .long 1072369159 + .long 1822067026 + .long 1072373966 + .long 3092190715 + .long 1072378786 + .long 364333489 + .long 1072383620 + .long 2380618042 + .long 1072388466 + .long 703710506 + .long 1072393326 + .long 4076559943 + .long 1072398198 + .long 4062661092 + .long 1072403084 + .long 815859274 + .long 1072407984 + .long 3080351519 + .long 1072412896 + .long 2420883922 + .long 1072417822 + .long 3287523847 + .long 1072422761 + .long 1540824585 + .long 1072427714 + .long 1631695677 + .long 1072432680 + .long 3716502172 + .long 1072437659 + .long 3657065772 + .long 1072442652 + .long 1610600570 + .long 1072447659 + .long 2029714210 + .long 1072452679 + .long 777507147 + .long 1072457713 + .long 2307442995 + .long 1072462760 + .long 2483480501 + .long 1072467821 + .long 1464976603 + .long 1072472896 + .long 3706687593 + .long 1072477984 + .long 778901109 + .long 1072483087 + .long 1432208378 + .long 1072488203 + .long 1532734324 + .long 1072493333 + .long 1242007932 + .long 1072498477 + .long 721996136 + .long 1072503635 + .long 135105010 + .long 1072508807 + .long 3939148246 + .long 1072513992 + .long 3707479175 + .long 1072519192 + .long 3898795731 + .long 1072524406 + .long 382305176 + .long 1072529635 + .long 1912561781 + .long 1072534877 + .long 64696965 + .long 1072540134 + .long 3594158869 + .long 1072545404 + .long 4076975200 + .long 1072550689 + .long 1679558232 + .long 1072555989 + .long 863738719 + .long 1072561303 + .long 1796832535 + .long 1072566631 + .long 351641897 + .long 1072571974 + .long 991358482 + .long 1072577331 + .long 3884662774 + .long 1072582702 + .long 610758006 + .long 1072588089 + .long 4224142467 + .long 1072593489 + .long 2009970496 + .long 1072598905 + .long 2728693978 + .long 1072604335 + .long 2256325230 + .long 1072609780 + .long 764307441 + .long 1072615240 + .long 2719515920 + .long 1072620714 + .long 3999357479 + .long 1072626203 + .long 481706282 + .long 1072631708 + .long 929806999 + .long 1072637227 + .long 1222472308 + .long 1072642761 + .long 1533953344 + .long 1072648310 + .long 2038973688 + .long 1072653874 + .long 2912730644 + .long 1072659453 + .long 35929225 + .long 1072665048 + .long 2174652632 + .long 1072670657 + .long 915592468 + .long 1072676282 + .long 730821105 + .long 1072681922 + .long 1797923801 + .long 1072687577 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dcosh_data_internal,@object + .size __svml_dcosh_data_internal,3776 + .space 832, 0x00 + .align 32 + +__dcosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2411329662 + .long 1082536910 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __dcosh_la_CoutTab,@object + .size __dcosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S new file mode 100644 index 0000000000..8b385cc297 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized cosh, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8v_cosh _ZGVeN8v_cosh_avx2_wrapper +#include "../svml_d_cosh8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c new file mode 100644 index 0000000000..576b3186d5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cosh, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8v_cosh +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_cosh, __GI__ZGVeN8v_cosh, __redirect__ZGVeN8v_cosh) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S new file mode 100644 index 0000000000..f4b46382e3 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S @@ -0,0 +1,1694 @@ +/* Function cosh vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVeN8v_cosh_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 3648+__svml_dcosh_data_internal(%rip), %zmm11 + vmovups 384+__svml_dcosh_data_internal(%rip), %zmm15 + +/* + * Load argument + * dM = x*2^K/log(2) + RShifter + */ + vmovups 3008+__svml_dcosh_data_internal(%rip), %zmm4 + vmovups 3072+__svml_dcosh_data_internal(%rip), %zmm2 + vmovups 3136+__svml_dcosh_data_internal(%rip), %zmm3 + vmovups 832+__svml_dcosh_data_internal(%rip), %zmm8 + vmovups 768+__svml_dcosh_data_internal(%rip), %zmm9 + vmovups 512+__svml_dcosh_data_internal(%rip), %zmm7 + vmovups 576+__svml_dcosh_data_internal(%rip), %zmm6 + vmovaps %zmm0, %zmm10 + +/* Abs argument */ + vandnpd %zmm10, %zmm11, %zmm5 + +/* Index and lookup */ + vmovups __svml_dcosh_data_internal(%rip), %zmm11 + vmovups 256+__svml_dcosh_data_internal(%rip), %zmm0 + vfmadd213pd {rn-sae}, %zmm15, %zmm5, %zmm4 + +/* + * Check for overflow\underflow + * + */ + vpsrlq $32, %zmm5, %zmm12 + +/* dN = dM - RShifter */ + vsubpd {rn-sae}, %zmm15, %zmm4, %zmm1 + vpmovqd %zmm12, %ymm13 + vpermt2pd 320+__svml_dcosh_data_internal(%rip), %zmm4, %zmm0 + vpermt2pd 64+__svml_dcosh_data_internal(%rip), %zmm4, %zmm11 + +/* dR = dX - dN*Log2_hi/2^K */ + vfnmadd231pd {rn-sae}, %zmm2, %zmm1, %zmm5 + +/* + * poly(r) = Gmjp(1 + a2*r^2 + a4*r^4) + Gmjn*(r+ a3*r^3 +a5*r^5) = + * = Gmjp_h +Gmjp_l+ Gmjp*r^2*(a2 + a4*r^2) + Gmjn*(r+ r^3*(a3 +a5*r^2) + */ + vmovups 704+__svml_dcosh_data_internal(%rip), %zmm12 + vpsllq $48, %zmm4, %zmm2 + +/* dR = dX - dN*Log2_hi/2^K */ + vfnmadd231pd {rn-sae}, %zmm3, %zmm1, %zmm5 + vmulpd {rn-sae}, %zmm5, %zmm5, %zmm1 + vfmadd231pd {rn-sae}, %zmm1, %zmm8, %zmm12 + vmovups 640+__svml_dcosh_data_internal(%rip), %zmm8 + vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm12 + vfmadd231pd {rn-sae}, %zmm1, %zmm9, %zmm8 + vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm8 + vpcmpgtd 3712+__svml_dcosh_data_internal(%rip), %ymm13, %ymm14 + vmovmskps %ymm14, %edx + +/* dOut=r^2*(a2 + a4*r^2) */ + vmulpd {rn-sae}, %zmm1, %zmm8, %zmm6 + +/* lM now is an EXP(2^N) */ + vpandq 3584+__svml_dcosh_data_internal(%rip), %zmm2, %zmm3 + vpaddq %zmm3, %zmm11, %zmm4 + vpsubq %zmm3, %zmm0, %zmm0 + vsubpd {rn-sae}, %zmm0, %zmm4, %zmm14 + vaddpd {rn-sae}, %zmm0, %zmm4, %zmm13 + +/* dM=r^2*(a3 +a5*r^2) */ + vmulpd {rn-sae}, %zmm1, %zmm12, %zmm0 + vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm6 + +/* dM= r + r^3*(a3 +a5*r^2) */ + vfmadd213pd {rn-sae}, %zmm5, %zmm5, %zmm0 + vfmadd213pd {rn-sae}, %zmm6, %zmm14, %zmm0 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm10, 64(%rsp) + vmovups %zmm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm0 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dcosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN8v_cosh_skx) + + .align 16,0x90 + +__svml_dcosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %edx + xorl %eax, %eax + andl $32752, %edx + cmpl $32752, %edx + je .LBL_2_12 + movq (%rdi), %rdx + movq %rdx, -8(%rsp) + shrq $56, %rdx + andl $127, %edx + movb %dl, -1(%rsp) + movzwl -2(%rsp), %ecx + andl $32752, %ecx + cmpl $15504, %ecx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__dcosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__dcosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__dcosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__dcosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__dcosh_la_CoutTab(%rip), %xmm1 + lea __dcosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__dcosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__dcosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__dcosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + andl $63, %r10d + addl $1022, %r9d + movsd -32(%rsp), %xmm3 + andl $2047, %r9d + negl %edi + shll $4, %r9d + addl $-4, %edi + mulsd 1104+__dcosh_la_CoutTab(%rip), %xmm3 + lea (%r10,%r10), %edx + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + movsd -32(%rsp), %xmm4 + orl %r9d, %esi + andl $2047, %edi + lea 1(%r10,%r10), %r11d + mulsd 1112+__dcosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + subsd %xmm4, %xmm0 + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + mulsd %xmm0, %xmm5 + movsd -8(%rsp), %xmm7 + movaps %xmm15, %xmm8 + movsd (%rcx,%rsi,8), %xmm11 + addl $129, %edx + mulsd %xmm7, %xmm11 + movaps %xmm15, %xmm10 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + addsd 1064+__dcosh_la_CoutTab(%rip), %xmm14 + movsd %xmm8, -24(%rsp) + movsd (%rcx,%r11,8), %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + movsd -24(%rsp), %xmm9 + mulsd %xmm6, %xmm12 + subsd %xmm9, %xmm10 + mulsd %xmm7, %xmm13 + mulsd %xmm5, %xmm14 + addsd %xmm11, %xmm10 + mulsd %xmm0, %xmm14 + movsd 1088+__dcosh_la_CoutTab(%rip), %xmm1 + movaps %xmm12, %xmm11 + mulsd %xmm5, %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm15, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__dcosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm15, %xmm0 + mulsd %xmm5, %xmm1 + addsd %xmm12, %xmm11 + movsd %xmm10, -16(%rsp) + addsd %xmm13, %xmm11 + addsd 1056+__dcosh_la_CoutTab(%rip), %xmm1 + addsd %xmm14, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm0, %xmm11 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm1, %xmm11 + addsd %xmm2, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + movsd %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__dcosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__dcosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + movsd %xmm0, (%r8) + ret + +.LBL_2_12: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_dcosh_cout_rare_internal,@function + .size __svml_dcosh_cout_rare_internal,.-__svml_dcosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dcosh_data_internal: + .long 0 + .long 1071644672 + .long 1828292879 + .long 1071691096 + .long 1014845819 + .long 1071739576 + .long 1853186616 + .long 1071790202 + .long 171030293 + .long 1071843070 + .long 1276261410 + .long 1071898278 + .long 3577096743 + .long 1071955930 + .long 3712504873 + .long 1072016135 + .long 1719614413 + .long 1072079006 + .long 1944781191 + .long 1072144660 + .long 1110089947 + .long 1072213221 + .long 2191782032 + .long 1072284817 + .long 2572866477 + .long 1072359583 + .long 3716502172 + .long 1072437659 + .long 3707479175 + .long 1072519192 + .long 2728693978 + .long 1072604335 + .long 0 + .long 0 + .long 1255956747 + .long 1015588398 + .long 3117910646 + .long 3161559105 + .long 3066496371 + .long 1015656574 + .long 3526460132 + .long 1014428778 + .long 300981948 + .long 1014684169 + .long 2951496418 + .long 1013793687 + .long 88491949 + .long 1015427660 + .long 330458198 + .long 3163282740 + .long 3993278767 + .long 3161724279 + .long 1451641639 + .long 1015474673 + .long 2960257726 + .long 1013742662 + .long 878562433 + .long 1015521741 + .long 2303740125 + .long 1014042725 + .long 3613079303 + .long 1014164738 + .long 396109971 + .long 3163462691 + .long 0 + .long 1071644672 + .long 2728693978 + .long 1071555759 + .long 3707479175 + .long 1071470616 + .long 3716502172 + .long 1071389083 + .long 2572866477 + .long 1071311007 + .long 2191782032 + .long 1071236241 + .long 1110089947 + .long 1071164645 + .long 1944781191 + .long 1071096084 + .long 1719614413 + .long 1071030430 + .long 3712504873 + .long 1070967559 + .long 3577096743 + .long 1070907354 + .long 1276261410 + .long 1070849702 + .long 171030293 + .long 1070794494 + .long 1853186616 + .long 1070741626 + .long 1014845819 + .long 1070691000 + .long 1828292879 + .long 1070642520 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 4 + .long 1071644672 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1431655747 + .long 1069897045 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 1430802231 + .long 1067799893 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 287861260 + .long 1065423121 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 3658019094 + .long 1062650243 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 1993999322 + .long 1059717517 + .long 0 + .long 1071644672 + .long 4200250559 + .long 1071647514 + .long 2851812149 + .long 1071650365 + .long 339411585 + .long 1071653224 + .long 1048019041 + .long 1071656090 + .long 772914124 + .long 1071658964 + .long 3899555717 + .long 1071661845 + .long 1928746161 + .long 1071664735 + .long 3541402996 + .long 1071667632 + .long 238821257 + .long 1071670538 + .long 702412510 + .long 1071673451 + .long 728934454 + .long 1071676372 + .long 410360776 + .long 1071679301 + .long 4133881824 + .long 1071682237 + .long 3402036099 + .long 1071685182 + .long 2602514713 + .long 1071688135 + .long 1828292879 + .long 1071691096 + .long 1172597893 + .long 1071694065 + .long 728909815 + .long 1071697042 + .long 590962156 + .long 1071700027 + .long 852742562 + .long 1071703020 + .long 1608493509 + .long 1071706021 + .long 2952712987 + .long 1071709030 + .long 685187902 + .long 1071712048 + .long 3490863953 + .long 1071715073 + .long 2875075254 + .long 1071718107 + .long 3228316108 + .long 1071721149 + .long 351405227 + .long 1071724200 + .long 2930322912 + .long 1071727258 + .long 2471440686 + .long 1071730325 + .long 3366293073 + .long 1071733400 + .long 1416741826 + .long 1071736484 + .long 1014845819 + .long 1071739576 + .long 2257959872 + .long 1071742676 + .long 948735466 + .long 1071745785 + .long 1480023343 + .long 1071748902 + .long 3949972341 + .long 1071752027 + .long 4162030108 + .long 1071755161 + .long 2214878420 + .long 1071758304 + .long 2502433899 + .long 1071761455 + .long 828946858 + .long 1071764615 + .long 1588871207 + .long 1071767783 + .long 586995997 + .long 1071770960 + .long 2218315341 + .long 1071774145 + .long 2288159958 + .long 1071777339 + .long 897099801 + .long 1071780542 + .long 2440944790 + .long 1071783753 + .long 2725843665 + .long 1071786973 + .long 1853186616 + .long 1071790202 + .long 4219606026 + .long 1071793439 + .long 1337108031 + .long 1071796686 + .long 1897844341 + .long 1071799941 + .long 1709341917 + .long 1071803205 + .long 874372905 + .long 1071806478 + .long 3790955393 + .long 1071809759 + .long 1972484976 + .long 1071813050 + .long 4112506593 + .long 1071816349 + .long 1724976915 + .long 1071819658 + .long 3504003472 + .long 1071822975 + .long 964107055 + .long 1071826302 + .long 2799960843 + .long 1071829637 + .long 526652809 + .long 1071832982 + .long 2839424854 + .long 1071836335 + .long 1253935211 + .long 1071839698 + .long 171030293 + .long 1071843070 + .long 3991843581 + .long 1071846450 + .long 4232894513 + .long 1071849840 + .long 1000925746 + .long 1071853240 + .long 2992903935 + .long 1071856648 + .long 1726216749 + .long 1071860066 + .long 1603444721 + .long 1071863493 + .long 2732492859 + .long 1071866929 + .long 926591435 + .long 1071870375 + .long 589198666 + .long 1071873830 + .long 1829099622 + .long 1071877294 + .long 460407023 + .long 1071880768 + .long 887463927 + .long 1071884251 + .long 3219942644 + .long 1071887743 + .long 3272845541 + .long 1071891245 + .long 1156440435 + .long 1071894757 + .long 1276261410 + .long 1071898278 + .long 3743175029 + .long 1071901808 + .long 78413852 + .long 1071905349 + .long 3278348324 + .long 1071908898 + .long 569847338 + .long 1071912458 + .long 654919306 + .long 1071916027 + .long 3645941911 + .long 1071919605 + .long 1065662932 + .long 1071923194 + .long 1617004845 + .long 1071926792 + .long 1118294578 + .long 1071930400 + .long 3978100823 + .long 1071934017 + .long 1720398391 + .long 1071937645 + .long 3049340112 + .long 1071941282 + .long 3784486610 + .long 1071944929 + .long 4040676318 + .long 1071948586 + .long 3933059031 + .long 1071952253 + .long 3577096743 + .long 1071955930 + .long 3088564500 + .long 1071959617 + .long 2583551245 + .long 1071963314 + .long 2178460671 + .long 1071967021 + .long 1990012071 + .long 1071970738 + .long 2135241198 + .long 1071974465 + .long 2731501122 + .long 1071978202 + .long 3896463087 + .long 1071981949 + .long 1453150082 + .long 1071985707 + .long 4109806887 + .long 1071989474 + .long 3395129871 + .long 1071993252 + .long 3723038930 + .long 1071997040 + .long 917841882 + .long 1072000839 + .long 3689071823 + .long 1072004647 + .long 3566716925 + .long 1072008466 + .long 671025100 + .long 1072012296 + .long 3712504873 + .long 1072016135 + .long 4222122499 + .long 1072019985 + .long 2321106615 + .long 1072023846 + .long 2425981843 + .long 1072027717 + .long 363667784 + .long 1072031599 + .long 551349105 + .long 1072035491 + .long 3111574537 + .long 1072039393 + .long 3872257780 + .long 1072043306 + .long 2956612997 + .long 1072047230 + .long 488188413 + .long 1072051165 + .long 885834528 + .long 1072055110 + .long 4273770423 + .long 1072059065 + .long 2186617381 + .long 1072063032 + .long 3339203574 + .long 1072067009 + .long 3561793907 + .long 1072070997 + .long 2979960120 + .long 1072074996 + .long 1719614413 + .long 1072079006 + .long 4201977662 + .long 1072083026 + .long 1963711167 + .long 1072087058 + .long 3721688645 + .long 1072091100 + .long 1013258799 + .long 1072095154 + .long 2555984613 + .long 1072099218 + .long 4182873220 + .long 1072103293 + .long 1727278727 + .long 1072107380 + .long 3907805044 + .long 1072111477 + .long 2263535754 + .long 1072115586 + .long 1218806132 + .long 1072119706 + .long 903334909 + .long 1072123837 + .long 1447192521 + .long 1072127979 + .long 2980802057 + .long 1072132132 + .long 1339972927 + .long 1072136297 + .long 950803702 + .long 1072140473 + .long 1944781191 + .long 1072144660 + .long 158781403 + .long 1072148859 + .long 19972402 + .long 1072153069 + .long 1660913392 + .long 1072157290 + .long 919555682 + .long 1072161523 + .long 2224145553 + .long 1072165767 + .long 1413356050 + .long 1072170023 + .long 2916157145 + .long 1072174290 + .long 2571947539 + .long 1072178569 + .long 515457527 + .long 1072182860 + .long 1176749997 + .long 1072187162 + .long 396319521 + .long 1072191476 + .long 2604962541 + .long 1072195801 + .long 3643909174 + .long 1072200138 + .long 3649726105 + .long 1072204487 + .long 2759350287 + .long 1072208848 + .long 1110089947 + .long 1072213221 + .long 3134592888 + .long 1072217605 + .long 380978316 + .long 1072222002 + .long 1577608921 + .long 1072226410 + .long 2568320822 + .long 1072230830 + .long 3492293770 + .long 1072235262 + .long 194117574 + .long 1072239707 + .long 1403662306 + .long 1072244163 + .long 2966275557 + .long 1072248631 + .long 727685349 + .long 1072253112 + .long 3418903055 + .long 1072257604 + .long 2591453363 + .long 1072262109 + .long 2682146384 + .long 1072266626 + .long 3833209506 + .long 1072271155 + .long 1892288442 + .long 1072275697 + .long 1297350157 + .long 1072280251 + .long 2191782032 + .long 1072284817 + .long 424392917 + .long 1072289396 + .long 434316067 + .long 1072293987 + .long 2366108318 + .long 1072298590 + .long 2069751141 + .long 1072303206 + .long 3985553595 + .long 1072307834 + .long 3964284211 + .long 1072312475 + .long 2152073944 + .long 1072317129 + .long 2990417245 + .long 1072321795 + .long 2331271250 + .long 1072326474 + .long 321958744 + .long 1072331166 + .long 1405169241 + .long 1072335870 + .long 1434058175 + .long 1072340587 + .long 557149882 + .long 1072345317 + .long 3218338682 + .long 1072350059 + .long 977020788 + .long 1072354815 + .long 2572866477 + .long 1072359583 + .long 3861050111 + .long 1072364364 + .long 697153126 + .long 1072369159 + .long 1822067026 + .long 1072373966 + .long 3092190715 + .long 1072378786 + .long 364333489 + .long 1072383620 + .long 2380618042 + .long 1072388466 + .long 703710506 + .long 1072393326 + .long 4076559943 + .long 1072398198 + .long 4062661092 + .long 1072403084 + .long 815859274 + .long 1072407984 + .long 3080351519 + .long 1072412896 + .long 2420883922 + .long 1072417822 + .long 3287523847 + .long 1072422761 + .long 1540824585 + .long 1072427714 + .long 1631695677 + .long 1072432680 + .long 3716502172 + .long 1072437659 + .long 3657065772 + .long 1072442652 + .long 1610600570 + .long 1072447659 + .long 2029714210 + .long 1072452679 + .long 777507147 + .long 1072457713 + .long 2307442995 + .long 1072462760 + .long 2483480501 + .long 1072467821 + .long 1464976603 + .long 1072472896 + .long 3706687593 + .long 1072477984 + .long 778901109 + .long 1072483087 + .long 1432208378 + .long 1072488203 + .long 1532734324 + .long 1072493333 + .long 1242007932 + .long 1072498477 + .long 721996136 + .long 1072503635 + .long 135105010 + .long 1072508807 + .long 3939148246 + .long 1072513992 + .long 3707479175 + .long 1072519192 + .long 3898795731 + .long 1072524406 + .long 382305176 + .long 1072529635 + .long 1912561781 + .long 1072534877 + .long 64696965 + .long 1072540134 + .long 3594158869 + .long 1072545404 + .long 4076975200 + .long 1072550689 + .long 1679558232 + .long 1072555989 + .long 863738719 + .long 1072561303 + .long 1796832535 + .long 1072566631 + .long 351641897 + .long 1072571974 + .long 991358482 + .long 1072577331 + .long 3884662774 + .long 1072582702 + .long 610758006 + .long 1072588089 + .long 4224142467 + .long 1072593489 + .long 2009970496 + .long 1072598905 + .long 2728693978 + .long 1072604335 + .long 2256325230 + .long 1072609780 + .long 764307441 + .long 1072615240 + .long 2719515920 + .long 1072620714 + .long 3999357479 + .long 1072626203 + .long 481706282 + .long 1072631708 + .long 929806999 + .long 1072637227 + .long 1222472308 + .long 1072642761 + .long 1533953344 + .long 1072648310 + .long 2038973688 + .long 1072653874 + .long 2912730644 + .long 1072659453 + .long 35929225 + .long 1072665048 + .long 2174652632 + .long 1072670657 + .long 915592468 + .long 1072676282 + .long 730821105 + .long 1072681922 + .long 1797923801 + .long 1072687577 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 1697350398 + .long 1073157447 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 4277927936 + .long 1072049730 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 2825664665 + .long 3182190860 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 0 + .long 1119354880 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 4294966717 + .long 1071644671 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 1887518228 + .long 1069897045 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 3474379417 + .long 1067799893 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2146435072 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .long 1082531225 + .type __svml_dcosh_data_internal,@object + .size __svml_dcosh_data_internal,3776 + .space 832, 0x00 + .align 32 + +__dcosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2411329662 + .long 1082536910 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __dcosh_la_CoutTab,@object + .size __dcosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S new file mode 100644 index 0000000000..456d8a129f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized coshf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16v_coshf _ZGVeN16v_coshf_avx2_wrapper +#include "../svml_s_coshf16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c new file mode 100644 index 0000000000..34c008871a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized coshf, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16v_coshf +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_coshf, __GI__ZGVeN16v_coshf, + __redirect__ZGVeN16v_coshf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S new file mode 100644 index 0000000000..c837384b8e --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S @@ -0,0 +1,1190 @@ +/* Function coshf vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVeN16v_coshf_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 1024+__svml_scosh_data_internal(%rip), %zmm4 + vmovups 384+__svml_scosh_data_internal(%rip), %zmm6 + +/* + * Load argument + * dM = x/log(2) + RShifter + */ + vmovups 768+__svml_scosh_data_internal(%rip), %zmm10 + vmovups 896+__svml_scosh_data_internal(%rip), %zmm7 + vmovups 960+__svml_scosh_data_internal(%rip), %zmm9 + +/* */ + vmovups 704+__svml_scosh_data_internal(%rip), %zmm2 + +/* x^2 */ + vmovups 640+__svml_scosh_data_internal(%rip), %zmm3 + +/* G1,G2 2^N,2^(-N) */ + vmovups __svml_scosh_data_internal(%rip), %zmm12 + vmovups 256+__svml_scosh_data_internal(%rip), %zmm13 + +/* + * Implementation + * Abs argument + */ + vandnps %zmm0, %zmm4, %zmm1 + +/* Check for overflow\underflow */ + vpternlogd $255, %zmm5, %zmm5, %zmm5 + vfmadd213ps {rn-sae}, %zmm6, %zmm1, %zmm10 + vpcmpd $1, 512+__svml_scosh_data_internal(%rip), %zmm1, %k1 + +/* iM now is an EXP(2^N) */ + vpslld $18, %zmm10, %zmm11 + +/* + * R + * sN = sM - RShifter + */ + vsubps {rn-sae}, %zmm6, %zmm10, %zmm8 + vpermt2ps 64+__svml_scosh_data_internal(%rip), %zmm10, %zmm12 + vpermt2ps 320+__svml_scosh_data_internal(%rip), %zmm10, %zmm13 + vpandnd %zmm1, %zmm1, %zmm5{%k1} + +/* sR = sX - sN*Log2_hi */ + vfnmadd231ps {rn-sae}, %zmm7, %zmm8, %zmm1 + vptestmd %zmm5, %zmm5, %k0 + +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */ + vfnmadd231ps {rn-sae}, %zmm9, %zmm8, %zmm1 + kmovw %k0, %edx + vmulps {rn-sae}, %zmm1, %zmm1, %zmm4 + vmulps {rn-sae}, %zmm4, %zmm2, %zmm2 + +/* sSinh_r = r + r*(r^2*(a3)) */ + vfmadd213ps {rn-sae}, %zmm1, %zmm1, %zmm2 + +/* sOut = r^2*(a2) */ + vmulps {rn-sae}, %zmm4, %zmm3, %zmm1 + vpandd 1216+__svml_scosh_data_internal(%rip), %zmm11, %zmm14 + vpaddd %zmm14, %zmm12, %zmm15 + vpsubd %zmm14, %zmm13, %zmm10 + +/* sG2 = 2^N*Th + 2^(-N)*T_h */ + vaddps {rn-sae}, %zmm10, %zmm15, %zmm5 + +/* sG1 = 2^N*Th - 2^(-N)*T_h */ + vsubps {rn-sae}, %zmm10, %zmm15, %zmm6 + +/* res = sG1*(r + r*(r^2*(a3))) + sG2*(1+r^2*(a2)) */ + vfmadd213ps {rn-sae}, %zmm5, %zmm5, %zmm1 + vfmadd213ps {rn-sae}, %zmm1, %zmm2, %zmm6 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm6, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm6, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $16, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm6 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_scosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN16v_coshf_skx) + + .align 16,0x90 + +__svml_scosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 2(%rdi), %edx + xorl %eax, %eax + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + pxor %xmm0, %xmm0 + cvtss2sd (%rdi), %xmm0 + movsd %xmm0, -8(%rsp) + andb $127, -1(%rsp) + movzwl -2(%rsp), %edx + andl $32752, %edx + cmpl $15504, %edx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__scosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__scosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__scosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__scosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__scosh_la_CoutTab(%rip), %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__scosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + movsd -32(%rsp), %xmm3 + addl $1022, %r9d + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + andl $63, %r10d + movsd -32(%rsp), %xmm4 + lea (%r10,%r10), %edx + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + andl $2047, %r9d + negl %edi + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + shll $4, %r9d + addl $-4, %edi + orl %r9d, %esi + andl $2047, %edi + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + lea 1(%r10,%r10), %r11d + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + addl $129, %edx + subsd %xmm4, %xmm0 + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + movaps %xmm15, %xmm8 + mulsd %xmm0, %xmm5 + movaps %xmm15, %xmm10 + movsd (%rcx,%r11,8), %xmm2 + mulsd %xmm6, %xmm2 + mulsd %xmm5, %xmm14 + movsd -8(%rsp), %xmm7 + movaps %xmm2, %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + mulsd %xmm7, %xmm13 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm14 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm5, %xmm1 + mulsd %xmm5, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm14 + addsd %xmm12, %xmm2 + mulsd %xmm5, %xmm1 + addsd %xmm13, %xmm2 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + movsd (%rcx,%rsi,8), %xmm11 + mulsd %xmm7, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + movsd %xmm8, -24(%rsp) + movsd -24(%rsp), %xmm9 + mulsd %xmm15, %xmm14 + subsd %xmm9, %xmm10 + mulsd %xmm15, %xmm0 + addsd %xmm11, %xmm10 + addsd %xmm14, %xmm2 + movsd %xmm10, -16(%rsp) + addsd %xmm0, %xmm2 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm6 + addsd %xmm1, %xmm2 + addsd %xmm6, %xmm2 + movsd %xmm2, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + cvtsd2ss %xmm3, %xmm3 + movss %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__scosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__scosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_12: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_scosh_cout_rare_internal,@function + .size __svml_scosh_cout_rare_internal,.-__svml_scosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scosh_data_internal: + .long 1056964608 + .long 1057148295 + .long 1057336003 + .long 1057527823 + .long 1057723842 + .long 1057924154 + .long 1058128851 + .long 1058338032 + .long 1058551792 + .long 1058770234 + .long 1058993458 + .long 1059221571 + .long 1059454679 + .long 1059692891 + .long 1059936319 + .long 1060185078 + .long 1060439283 + .long 1060699055 + .long 1060964516 + .long 1061235789 + .long 1061513002 + .long 1061796286 + .long 1062085772 + .long 1062381598 + .long 1062683901 + .long 1062992824 + .long 1063308511 + .long 1063631111 + .long 1063960775 + .long 1064297658 + .long 1064641917 + .long 1064993715 + .long 0 + .long 2999887785 + .long 852465809 + .long 3003046475 + .long 2984291233 + .long 3001644133 + .long 854021668 + .long 2997748242 + .long 849550193 + .long 2995541347 + .long 851518274 + .long 809701978 + .long 2997656926 + .long 2996185864 + .long 2980965110 + .long 3002882728 + .long 844097402 + .long 848217591 + .long 2999013352 + .long 2992006718 + .long 831170615 + .long 3002278818 + .long 833158180 + .long 3000769962 + .long 2991891850 + .long 2999994908 + .long 2979965785 + .long 2982419430 + .long 2982221534 + .long 2999469642 + .long 833168438 + .long 2987538264 + .long 1056964608 + .long 1056605107 + .long 1056253309 + .long 1055909050 + .long 1055572167 + .long 1055242503 + .long 1054919903 + .long 1054604216 + .long 1054295293 + .long 1053992990 + .long 1053697164 + .long 1053407678 + .long 1053124394 + .long 1052847181 + .long 1052575908 + .long 1052310447 + .long 1052050675 + .long 1051796470 + .long 1051547711 + .long 1051304283 + .long 1051066071 + .long 1050832963 + .long 1050604850 + .long 1050381626 + .long 1050163184 + .long 1049949424 + .long 1049740243 + .long 1049535546 + .long 1049335234 + .long 1049139215 + .long 1048947395 + .long 1048759687 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .type __svml_scosh_data_internal,@object + .size __svml_scosh_data_internal,1920 + .align 32 + +__scosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2684354560 + .long 1079401119 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __scosh_la_CoutTab,@object + .size __scosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S new file mode 100644 index 0000000000..c719dc7d6a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized coshf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4v_coshf _ZGVbN4v_coshf_sse2 +#include "../svml_s_coshf4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c new file mode 100644 index 0000000000..c2dfcd44f8 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized coshf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4v_coshf +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_coshf, __GI__ZGVbN4v_coshf, + __redirect__ZGVbN4v_coshf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S new file mode 100644 index 0000000000..d1b541f18e --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S @@ -0,0 +1,2104 @@ +/* Function coshf vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVbN4v_coshf_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm5 + +/* + * Implementation + * Abs argument + */ + movups 1024+__svml_scosh_data_internal(%rip), %xmm3 + +/* + * Load argument + * dM = x/log(2) + RShifter + */ + movups 768+__svml_scosh_data_internal(%rip), %xmm2 + andnps %xmm5, %xmm3 + mulps %xmm3, %xmm2 + +/* Check for overflow\underflow */ + movaps %xmm3, %xmm7 + movups 1280+__svml_scosh_data_internal(%rip), %xmm0 + movdqu 1408+__svml_scosh_data_internal(%rip), %xmm6 + addps %xmm0, %xmm2 + +/* + * R + * sN = sM - RShifter + */ + movaps %xmm2, %xmm1 + pcmpgtd %xmm6, %xmm7 + pcmpeqd %xmm3, %xmm6 + +/* + * G1,G2 2^N,2^(-N) + * iM now is an EXP(2^N) + */ + pslld $23, %xmm2 + por %xmm6, %xmm7 + subps %xmm0, %xmm1 + movmskps %xmm7, %edx + +/* sR = sX - sN*Log2_hi */ + movups 896+__svml_scosh_data_internal(%rip), %xmm0 + mulps %xmm1, %xmm0 + +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */ + movups 960+__svml_scosh_data_internal(%rip), %xmm6 + subps %xmm0, %xmm3 + mulps %xmm1, %xmm6 + movdqu 1856+__svml_scosh_data_internal(%rip), %xmm4 + subps %xmm6, %xmm3 + +/* sR2 = sR^2,shaffled */ + movaps %xmm3, %xmm1 + movdqa %xmm4, %xmm0 + mulps %xmm3, %xmm1 + paddd %xmm2, %xmm0 + psubd %xmm2, %xmm4 + +/* sG1 = 2^(N-1)-2^(-N-1) */ + movdqa %xmm0, %xmm6 + +/* + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) .... + * sSinh_r = (a3+r^2*a5) + */ + movups 1728+__svml_scosh_data_internal(%rip), %xmm2 + subps %xmm4, %xmm6 + mulps %xmm1, %xmm2 + +/* sG2 = 2^(N-1)+2^(-N-1) */ + addps %xmm4, %xmm0 + +/* + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) + * sOut = (a4 +a6*sR2) + */ + movups 1792+__svml_scosh_data_internal(%rip), %xmm4 + mulps %xmm1, %xmm4 + addps 1600+__svml_scosh_data_internal(%rip), %xmm2 + +/* sSinh_r = r^2*(a3+r^2*a5) */ + mulps %xmm1, %xmm2 + addps 1664+__svml_scosh_data_internal(%rip), %xmm4 + +/* sOut = a2+sR2*(a4+a6*sR2) */ + mulps %xmm1, %xmm4 + +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */ + mulps %xmm3, %xmm2 + addps 1536+__svml_scosh_data_internal(%rip), %xmm4 + addps %xmm2, %xmm3 + +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */ + mulps %xmm4, %xmm1 + +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + mulps %xmm3, %xmm6 + +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + mulps %xmm0, %xmm1 + addps %xmm6, %xmm1 + +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + addps %xmm1, %xmm0 + +/* Ret H */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm5, 192(%rsp) + movups %xmm0, 256(%rsp) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,4), %rdi + lea 256(%rsp,%r12,4), %rsi + call __svml_scosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN4v_coshf_sse4) + + .align 16,0x90 + +__svml_scosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 2(%rdi), %edx + xorl %eax, %eax + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + pxor %xmm0, %xmm0 + cvtss2sd (%rdi), %xmm0 + movsd %xmm0, -8(%rsp) + andb $127, -1(%rsp) + movzwl -2(%rsp), %edx + andl $32752, %edx + cmpl $15504, %edx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__scosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__scosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__scosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__scosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__scosh_la_CoutTab(%rip), %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__scosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + movsd -32(%rsp), %xmm3 + addl $1022, %r9d + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + andl $63, %r10d + movsd -32(%rsp), %xmm4 + lea (%r10,%r10), %edx + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + andl $2047, %r9d + negl %edi + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + shll $4, %r9d + addl $-4, %edi + orl %r9d, %esi + andl $2047, %edi + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + lea 1(%r10,%r10), %r11d + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + addl $129, %edx + subsd %xmm4, %xmm0 + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + movaps %xmm15, %xmm8 + mulsd %xmm0, %xmm5 + movaps %xmm15, %xmm10 + movsd (%rcx,%r11,8), %xmm2 + mulsd %xmm6, %xmm2 + mulsd %xmm5, %xmm14 + movsd -8(%rsp), %xmm7 + movaps %xmm2, %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + mulsd %xmm7, %xmm13 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm14 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm5, %xmm1 + mulsd %xmm5, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm14 + addsd %xmm12, %xmm2 + mulsd %xmm5, %xmm1 + addsd %xmm13, %xmm2 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + movsd (%rcx,%rsi,8), %xmm11 + mulsd %xmm7, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + movsd %xmm8, -24(%rsp) + movsd -24(%rsp), %xmm9 + mulsd %xmm15, %xmm14 + subsd %xmm9, %xmm10 + mulsd %xmm15, %xmm0 + addsd %xmm11, %xmm10 + addsd %xmm14, %xmm2 + movsd %xmm10, -16(%rsp) + addsd %xmm0, %xmm2 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm6 + addsd %xmm1, %xmm2 + addsd %xmm6, %xmm2 + movsd %xmm2, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + cvtsd2ss %xmm3, %xmm3 + movss %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__scosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__scosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_12: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_scosh_cout_rare_internal,@function + .size __svml_scosh_cout_rare_internal,.-__svml_scosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scosh_data_internal: + .long 1056964608 + .long 1057148295 + .long 1057336003 + .long 1057527823 + .long 1057723842 + .long 1057924154 + .long 1058128851 + .long 1058338032 + .long 1058551792 + .long 1058770234 + .long 1058993458 + .long 1059221571 + .long 1059454679 + .long 1059692891 + .long 1059936319 + .long 1060185078 + .long 1060439283 + .long 1060699055 + .long 1060964516 + .long 1061235789 + .long 1061513002 + .long 1061796286 + .long 1062085772 + .long 1062381598 + .long 1062683901 + .long 1062992824 + .long 1063308511 + .long 1063631111 + .long 1063960775 + .long 1064297658 + .long 1064641917 + .long 1064993715 + .long 0 + .long 2999887785 + .long 852465809 + .long 3003046475 + .long 2984291233 + .long 3001644133 + .long 854021668 + .long 2997748242 + .long 849550193 + .long 2995541347 + .long 851518274 + .long 809701978 + .long 2997656926 + .long 2996185864 + .long 2980965110 + .long 3002882728 + .long 844097402 + .long 848217591 + .long 2999013352 + .long 2992006718 + .long 831170615 + .long 3002278818 + .long 833158180 + .long 3000769962 + .long 2991891850 + .long 2999994908 + .long 2979965785 + .long 2982419430 + .long 2982221534 + .long 2999469642 + .long 833168438 + .long 2987538264 + .long 1056964608 + .long 1056605107 + .long 1056253309 + .long 1055909050 + .long 1055572167 + .long 1055242503 + .long 1054919903 + .long 1054604216 + .long 1054295293 + .long 1053992990 + .long 1053697164 + .long 1053407678 + .long 1053124394 + .long 1052847181 + .long 1052575908 + .long 1052310447 + .long 1052050675 + .long 1051796470 + .long 1051547711 + .long 1051304283 + .long 1051066071 + .long 1050832963 + .long 1050604850 + .long 1050381626 + .long 1050163184 + .long 1049949424 + .long 1049740243 + .long 1049535546 + .long 1049335234 + .long 1049139215 + .long 1048947395 + .long 1048759687 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_scosh_data_internal,@object + .size __svml_scosh_data_internal,1920 + .align 32 + +__scosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2684354560 + .long 1079401119 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __scosh_la_CoutTab,@object + .size __scosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S new file mode 100644 index 0000000000..c27229e1fa --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized coshf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8v_coshf _ZGVdN8v_coshf_sse_wrapper +#include "../svml_s_coshf8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c new file mode 100644 index 0000000000..e82818b2c9 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized coshf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8v_coshf +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_coshf, __GI__ZGVdN8v_coshf, + __redirect__ZGVdN8v_coshf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S new file mode 100644 index 0000000000..9f9e56003d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S @@ -0,0 +1,1803 @@ +/* Function coshf vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Compute cosh(x) as (exp(x)+exp(-x))/2, + * where exp is calculated as + * exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r) + * + * Special cases: + * + * cosh(NaN) = quiet NaN, and raise invalid exception + * cosh(INF) = that INF + * cosh(0) = 1 + * cosh(x) overflows for big x and returns MAXLOG+log(2) + * + */ + +#include + + .text +ENTRY(_ZGVdN8v_coshf_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + vmovups 1024+__svml_scosh_data_internal(%rip), %ymm2 + vmovups 1280+__svml_scosh_data_internal(%rip), %ymm7 + vmovups 1408+__svml_scosh_data_internal(%rip), %ymm3 + vmovups %ymm9, 96(%rsp) + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + +/* + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) .... + * sSinh_r = (a3+r^2*a5) + */ + vmovups 1728+__svml_scosh_data_internal(%rip), %ymm9 + vmovups %ymm11, 192(%rsp) + vmovups %ymm8, 32(%rsp) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + vmovups 1856+__svml_scosh_data_internal(%rip), %ymm8 + vmovups %ymm10, 160(%rsp) + vmovups %ymm13, 256(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm15, 320(%rsp) + vmovaps %ymm0, %ymm1 + +/* + * Implementation + * Abs argument + */ + vandnps %ymm1, %ymm2, %ymm0 + +/* + * Load argument + * dM = x/log(2) + RShifter + */ + vmovups 768+__svml_scosh_data_internal(%rip), %ymm2 + vfmadd213ps %ymm7, %ymm0, %ymm2 + +/* Check for overflow\underflow */ + vpcmpgtd %ymm3, %ymm0, %ymm4 + vpcmpeqd %ymm3, %ymm0, %ymm5 + +/* + * R + * sN = sM - RShifter + */ + vsubps %ymm7, %ymm2, %ymm3 + vpor %ymm5, %ymm4, %ymm6 + +/* + * G1,G2 2^N,2^(-N) + * iM now is an EXP(2^N) + */ + vpslld $23, %ymm2, %ymm4 + +/* sR = sX - sN*Log2_hi */ + vfnmadd231ps 896+__svml_scosh_data_internal(%rip), %ymm3, %ymm0 + vpaddd %ymm4, %ymm8, %ymm5 + +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */ + vfnmadd231ps 960+__svml_scosh_data_internal(%rip), %ymm3, %ymm0 + +/* sR2 = sR^2,shaffled */ + vmulps %ymm0, %ymm0, %ymm7 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vfmadd213ps 1600+__svml_scosh_data_internal(%rip), %ymm7, %ymm9 + +/* sSinh_r = r^2*(a3+r^2*a5) */ + vmulps %ymm9, %ymm7, %ymm11 + +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */ + vfmadd213ps %ymm0, %ymm0, %ymm11 + +/* + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) + * sOut = (a4 +a6*sR2) + */ + vmovups 1792+__svml_scosh_data_internal(%rip), %ymm0 + vfmadd213ps 1664+__svml_scosh_data_internal(%rip), %ymm7, %ymm0 + +/* sOut = a2+sR2*(a4+a6*sR2) */ + vfmadd213ps 1536+__svml_scosh_data_internal(%rip), %ymm7, %ymm0 + vmovmskps %ymm6, %edx + vpsubd %ymm4, %ymm8, %ymm6 + +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */ + vmulps %ymm0, %ymm7, %ymm10 + +/* sG2 = 2^(N-1)+2^(-N-1) */ + vaddps %ymm6, %ymm5, %ymm13 + +/* sG1 = 2^(N-1)-2^(-N-1) */ + vsubps %ymm6, %ymm5, %ymm14 + +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + vmulps %ymm10, %ymm13, %ymm12 + +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + vfmadd213ps %ymm12, %ymm11, %ymm14 + +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */ + vaddps %ymm14, %ymm13, %ymm0 + +/* Ret H */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovups %ymm1, 64(%rsp) + vmovups %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovups 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_scosh_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN8v_coshf_avx2) + + .align 16,0x90 + +__svml_scosh_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 2(%rdi), %edx + xorl %eax, %eax + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + pxor %xmm0, %xmm0 + cvtss2sd (%rdi), %xmm0 + movsd %xmm0, -8(%rsp) + andb $127, -1(%rsp) + movzwl -2(%rsp), %edx + andl $32752, %edx + cmpl $15504, %edx + jle .LBL_2_10 + movsd -8(%rsp), %xmm0 + movsd 1096+__scosh_la_CoutTab(%rip), %xmm1 + comisd %xmm0, %xmm1 + jbe .LBL_2_9 + movq 1128+__scosh_la_CoutTab(%rip), %rdx + movq %rdx, -8(%rsp) + comisd 1144+__scosh_la_CoutTab(%rip), %xmm0 + jb .LBL_2_8 + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %r9 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + movl -40(%rsp), %edx + movl %edx, %esi + andl $63, %esi + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea (%rsi,%rsi), %ecx + movsd -32(%rsp), %xmm3 + lea 1(%rsi,%rsi), %edi + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + movsd -32(%rsp), %xmm4 + subsd %xmm3, %xmm0 + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + shrl $6, %edx + subsd %xmm4, %xmm0 + mulsd %xmm0, %xmm1 + addl $1022, %edx + andl $2047, %edx + addsd 1080+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm1 + mulsd %xmm0, %xmm1 + addsd %xmm0, %xmm1 + movsd (%r9,%rcx,8), %xmm0 + mulsd %xmm0, %xmm1 + addsd (%r9,%rdi,8), %xmm1 + addsd %xmm0, %xmm1 + cmpl $2046, %edx + ja .LBL_2_7 + movq 1128+__scosh_la_CoutTab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm1 + mulsd 1024+__scosh_la_CoutTab(%rip), %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, (%r8) + ret + +.LBL_2_8: + movsd 1040+__scosh_la_CoutTab(%rip), %xmm1 + lea __scosh_la_CoutTab(%rip), %rcx + movzwl -2(%rsp), %esi + andl $-32753, %esi + movsd 1080+__scosh_la_CoutTab(%rip), %xmm14 + mulsd %xmm0, %xmm1 + addsd 1048+__scosh_la_CoutTab(%rip), %xmm1 + movsd %xmm1, -40(%rsp) + movsd -40(%rsp), %xmm2 + movl -40(%rsp), %r10d + movl %r10d, %r9d + shrl $6, %r9d + subsd 1048+__scosh_la_CoutTab(%rip), %xmm2 + movsd %xmm2, -32(%rsp) + lea 1023(%r9), %edi + movsd -32(%rsp), %xmm3 + addl $1022, %r9d + mulsd 1104+__scosh_la_CoutTab(%rip), %xmm3 + andl $63, %r10d + movsd -32(%rsp), %xmm4 + lea (%r10,%r10), %edx + mulsd 1112+__scosh_la_CoutTab(%rip), %xmm4 + subsd %xmm3, %xmm0 + andl $2047, %r9d + negl %edi + movsd (%rcx,%rdx,8), %xmm15 + negl %edx + shll $4, %r9d + addl $-4, %edi + orl %r9d, %esi + andl $2047, %edi + movw %si, -2(%rsp) + andl $-32753, %esi + shll $4, %edi + lea 1(%r10,%r10), %r11d + movsd -8(%rsp), %xmm6 + orl %edi, %esi + movw %si, -2(%rsp) + lea 128(%rdx), %esi + addl $129, %edx + subsd %xmm4, %xmm0 + mulsd %xmm6, %xmm15 + movaps %xmm0, %xmm5 + movaps %xmm15, %xmm8 + mulsd %xmm0, %xmm5 + movaps %xmm15, %xmm10 + movsd (%rcx,%r11,8), %xmm2 + mulsd %xmm6, %xmm2 + mulsd %xmm5, %xmm14 + movsd -8(%rsp), %xmm7 + movaps %xmm2, %xmm12 + movsd (%rcx,%rdx,8), %xmm13 + mulsd %xmm7, %xmm13 + addsd 1064+__scosh_la_CoutTab(%rip), %xmm14 + movsd 1088+__scosh_la_CoutTab(%rip), %xmm1 + subsd %xmm13, %xmm12 + mulsd %xmm5, %xmm1 + mulsd %xmm5, %xmm14 + mulsd %xmm0, %xmm12 + addsd 1072+__scosh_la_CoutTab(%rip), %xmm1 + mulsd %xmm0, %xmm14 + addsd %xmm12, %xmm2 + mulsd %xmm5, %xmm1 + addsd %xmm13, %xmm2 + addsd 1056+__scosh_la_CoutTab(%rip), %xmm1 + movsd (%rcx,%rsi,8), %xmm11 + mulsd %xmm7, %xmm11 + mulsd %xmm5, %xmm1 + addsd %xmm11, %xmm8 + subsd %xmm11, %xmm15 + movsd %xmm8, -24(%rsp) + movsd -24(%rsp), %xmm9 + mulsd %xmm15, %xmm14 + subsd %xmm9, %xmm10 + mulsd %xmm15, %xmm0 + addsd %xmm11, %xmm10 + addsd %xmm14, %xmm2 + movsd %xmm10, -16(%rsp) + addsd %xmm0, %xmm2 + movsd -24(%rsp), %xmm3 + mulsd %xmm3, %xmm1 + movsd -16(%rsp), %xmm6 + addsd %xmm1, %xmm2 + addsd %xmm6, %xmm2 + movsd %xmm2, -24(%rsp) + movsd -24(%rsp), %xmm0 + addsd %xmm0, %xmm3 + cvtsd2ss %xmm3, %xmm3 + movss %xmm3, (%r8) + ret + +.LBL_2_9: + movsd 1120+__scosh_la_CoutTab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_10: + movsd 1136+__scosh_la_CoutTab(%rip), %xmm0 + addsd -8(%rsp), %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + +.LBL_2_12: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + ret + + cfi_endproc + + .type __svml_scosh_cout_rare_internal,@function + .size __svml_scosh_cout_rare_internal,.-__svml_scosh_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scosh_data_internal: + .long 1056964608 + .long 1057148295 + .long 1057336003 + .long 1057527823 + .long 1057723842 + .long 1057924154 + .long 1058128851 + .long 1058338032 + .long 1058551792 + .long 1058770234 + .long 1058993458 + .long 1059221571 + .long 1059454679 + .long 1059692891 + .long 1059936319 + .long 1060185078 + .long 1060439283 + .long 1060699055 + .long 1060964516 + .long 1061235789 + .long 1061513002 + .long 1061796286 + .long 1062085772 + .long 1062381598 + .long 1062683901 + .long 1062992824 + .long 1063308511 + .long 1063631111 + .long 1063960775 + .long 1064297658 + .long 1064641917 + .long 1064993715 + .long 0 + .long 2999887785 + .long 852465809 + .long 3003046475 + .long 2984291233 + .long 3001644133 + .long 854021668 + .long 2997748242 + .long 849550193 + .long 2995541347 + .long 851518274 + .long 809701978 + .long 2997656926 + .long 2996185864 + .long 2980965110 + .long 3002882728 + .long 844097402 + .long 848217591 + .long 2999013352 + .long 2992006718 + .long 831170615 + .long 3002278818 + .long 833158180 + .long 3000769962 + .long 2991891850 + .long 2999994908 + .long 2979965785 + .long 2982419430 + .long 2982221534 + .long 2999469642 + .long 833168438 + .long 2987538264 + .long 1056964608 + .long 1056605107 + .long 1056253309 + .long 1055909050 + .long 1055572167 + .long 1055242503 + .long 1054919903 + .long 1054604216 + .long 1054295293 + .long 1053992990 + .long 1053697164 + .long 1053407678 + .long 1053124394 + .long 1052847181 + .long 1052575908 + .long 1052310447 + .long 1052050675 + .long 1051796470 + .long 1051547711 + .long 1051304283 + .long 1051066071 + .long 1050832963 + .long 1050604850 + .long 1050381626 + .long 1050163184 + .long 1049949424 + .long 1049740243 + .long 1049535546 + .long 1049335234 + .long 1049139215 + .long 1048947395 + .long 1048759687 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .long 1220542464 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .long 1056964879 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .long 1042983629 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .long 1069066811 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .long 849703008 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .long 1060204544 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .long 939916788 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .long 1118743630 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .long 1042983511 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .long 1026206322 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .long 1007228001 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .long 985049251 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_scosh_data_internal,@object + .size __svml_scosh_data_internal,1920 + .align 32 + +__scosh_la_CoutTab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 1398474845 + .long 3161559171 + .long 3541402996 + .long 1072716208 + .long 2759177317 + .long 1015903202 + .long 410360776 + .long 1072727877 + .long 1269990655 + .long 1013024446 + .long 1828292879 + .long 1072739672 + .long 1255956747 + .long 1016636974 + .long 852742562 + .long 1072751596 + .long 667253587 + .long 1010842135 + .long 3490863953 + .long 1072763649 + .long 960797498 + .long 3163997456 + .long 2930322912 + .long 1072775834 + .long 2599499422 + .long 3163762623 + .long 1014845819 + .long 1072788152 + .long 3117910646 + .long 3162607681 + .long 3949972341 + .long 1072800603 + .long 2068408548 + .long 1015962444 + .long 828946858 + .long 1072813191 + .long 10642492 + .long 1016988014 + .long 2288159958 + .long 1072825915 + .long 2169144469 + .long 1015924597 + .long 1853186616 + .long 1072838778 + .long 3066496371 + .long 1016705150 + .long 1709341917 + .long 1072851781 + .long 2571168217 + .long 1015201075 + .long 4112506593 + .long 1072864925 + .long 2947355221 + .long 1015419624 + .long 2799960843 + .long 1072878213 + .long 1423655381 + .long 1016070727 + .long 171030293 + .long 1072891646 + .long 3526460132 + .long 1015477354 + .long 2992903935 + .long 1072905224 + .long 2218154406 + .long 1016276769 + .long 926591435 + .long 1072918951 + .long 3208833762 + .long 3163962090 + .long 887463927 + .long 1072932827 + .long 3596744163 + .long 3161842742 + .long 1276261410 + .long 1072946854 + .long 300981948 + .long 1015732745 + .long 569847338 + .long 1072961034 + .long 472945272 + .long 3160339305 + .long 1617004845 + .long 1072975368 + .long 82804944 + .long 1011391354 + .long 3049340112 + .long 1072989858 + .long 3062915824 + .long 1014219171 + .long 3577096743 + .long 1073004506 + .long 2951496418 + .long 1014842263 + .long 1990012071 + .long 1073019314 + .long 3529070563 + .long 3163861769 + .long 1453150082 + .long 1073034283 + .long 498154669 + .long 3162536638 + .long 917841882 + .long 1073049415 + .long 18715565 + .long 1016707884 + .long 3712504873 + .long 1073064711 + .long 88491949 + .long 1016476236 + .long 363667784 + .long 1073080175 + .long 813753950 + .long 1016833785 + .long 2956612997 + .long 1073095806 + .long 2118169751 + .long 3163784129 + .long 2186617381 + .long 1073111608 + .long 2270764084 + .long 3164321289 + .long 1719614413 + .long 1073127582 + .long 330458198 + .long 3164331316 + .long 1013258799 + .long 1073143730 + .long 1748797611 + .long 3161177658 + .long 3907805044 + .long 1073160053 + .long 2257091225 + .long 3162598983 + .long 1447192521 + .long 1073176555 + .long 1462857171 + .long 3163563097 + .long 1944781191 + .long 1073193236 + .long 3993278767 + .long 3162772855 + .long 919555682 + .long 1073210099 + .long 3121969534 + .long 1013996802 + .long 2571947539 + .long 1073227145 + .long 3558159064 + .long 3164425245 + .long 2604962541 + .long 1073244377 + .long 2614425274 + .long 3164587768 + .long 1110089947 + .long 1073261797 + .long 1451641639 + .long 1016523249 + .long 2568320822 + .long 1073279406 + .long 2732824428 + .long 1015401491 + .long 2966275557 + .long 1073297207 + .long 2176155324 + .long 3160891335 + .long 2682146384 + .long 1073315202 + .long 2082178513 + .long 3164411995 + .long 2191782032 + .long 1073333393 + .long 2960257726 + .long 1014791238 + .long 2069751141 + .long 1073351782 + .long 1562170675 + .long 3163773257 + .long 2990417245 + .long 1073370371 + .long 3683467745 + .long 3164417902 + .long 1434058175 + .long 1073389163 + .long 251133233 + .long 1016134345 + .long 2572866477 + .long 1073408159 + .long 878562433 + .long 1016570317 + .long 3092190715 + .long 1073427362 + .long 814012168 + .long 3160571998 + .long 4076559943 + .long 1073446774 + .long 2119478331 + .long 3161806927 + .long 2420883922 + .long 1073466398 + .long 2049810052 + .long 1015168464 + .long 3716502172 + .long 1073486235 + .long 2303740125 + .long 1015091301 + .long 777507147 + .long 1073506289 + .long 4282924205 + .long 1016236109 + .long 3706687593 + .long 1073526560 + .long 3521726939 + .long 1014301643 + .long 1242007932 + .long 1073547053 + .long 1132034716 + .long 3164388407 + .long 3707479175 + .long 1073567768 + .long 3613079303 + .long 1015213314 + .long 64696965 + .long 1073588710 + .long 1768797490 + .long 1016865536 + .long 863738719 + .long 1073609879 + .long 1326992220 + .long 3163661773 + .long 3884662774 + .long 1073631278 + .long 2158611599 + .long 1015258761 + .long 2728693978 + .long 1073652911 + .long 396109971 + .long 3164511267 + .long 3999357479 + .long 1073674779 + .long 2258941616 + .long 1016973300 + .long 1533953344 + .long 1073696886 + .long 769171851 + .long 1016714209 + .long 2174652632 + .long 1073719233 + .long 4087714590 + .long 1015498835 + .long 0 + .long 1073741824 + .long 0 + .long 0 + .long 1697350398 + .long 1079448903 + .long 0 + .long 1127743488 + .long 0 + .long 1071644672 + .long 1431652600 + .long 1069897045 + .long 1431670732 + .long 1067799893 + .long 984555731 + .long 1065423122 + .long 472530941 + .long 1062650218 + .long 2684354560 + .long 1079401119 + .long 4277796864 + .long 1065758274 + .long 3164486458 + .long 1025308570 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 3875694624 + .long 1077247184 + .type __scosh_la_CoutTab,@object + .size __scosh_la_CoutTab,1152 diff --git a/sysdeps/x86_64/fpu/svml_d_cosh2_core.S b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S new file mode 100644 index 0000000000..f95952cfe5 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S @@ -0,0 +1,29 @@ +/* Function cosh vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_cosh) +WRAPPER_IMPL_SSE2 cosh +END (_ZGVbN2v_cosh) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2v_cosh) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S new file mode 100644 index 0000000000..cc24d0fb6b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S @@ -0,0 +1,29 @@ +/* Function cosh vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_cosh) +WRAPPER_IMPL_AVX _ZGVbN2v_cosh +END (_ZGVdN4v_cosh) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4v_cosh) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S new file mode 100644 index 0000000000..4323f5e308 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S @@ -0,0 +1,25 @@ +/* Function cosh vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_cosh) +WRAPPER_IMPL_AVX _ZGVbN2v_cosh +END (_ZGVcN4v_cosh) diff --git a/sysdeps/x86_64/fpu/svml_d_cosh8_core.S b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S new file mode 100644 index 0000000000..90ee1ca125 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S @@ -0,0 +1,25 @@ +/* Function cosh vectorized with AVX-512, wrapper to AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_cosh) +WRAPPER_IMPL_AVX512 _ZGVdN4v_cosh +END (_ZGVeN8v_cosh) diff --git a/sysdeps/x86_64/fpu/svml_s_coshf16_core.S b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S new file mode 100644 index 0000000000..fe243b8b94 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S @@ -0,0 +1,25 @@ +/* Function coshf vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_coshf) +WRAPPER_IMPL_AVX512 _ZGVdN8v_coshf +END (_ZGVeN16v_coshf) diff --git a/sysdeps/x86_64/fpu/svml_s_coshf4_core.S b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S new file mode 100644 index 0000000000..b55ede6e38 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S @@ -0,0 +1,29 @@ +/* Function coshf vectorized with SSE2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_coshf) +WRAPPER_IMPL_SSE2 coshf +END (_ZGVbN4v_coshf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4v_coshf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S new file mode 100644 index 0000000000..3ea02d0f19 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S @@ -0,0 +1,29 @@ +/* Function coshf vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_coshf) +WRAPPER_IMPL_AVX _ZGVbN4v_coshf +END (_ZGVdN8v_coshf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8v_coshf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S new file mode 100644 index 0000000000..9b3002f7c9 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S @@ -0,0 +1,25 @@ +/* Function coshf vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVcN8v_coshf) +WRAPPER_IMPL_AVX _ZGVbN4v_coshf +END (_ZGVcN8v_coshf) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c new file mode 100644 index 0000000000..1dd311a562 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cosh.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c new file mode 100644 index 0000000000..1dd311a562 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cosh.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c new file mode 100644 index 0000000000..1dd311a562 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cosh.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c new file mode 100644 index 0000000000..cf49ec5d87 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC cosh +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index 76dc92b983..03fcb80d95 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh) VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt) +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index e16abf5bb0..03e1d9eeda 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh) VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt) +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index 84091a860f..ed0f71570a 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh) VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt) +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index 873d7aa9c8..3cb0f61667 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh) VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt) +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c new file mode 100644 index 0000000000..905dc3ca4a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-coshf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c new file mode 100644 index 0000000000..905dc3ca4a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-coshf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c new file mode 100644 index 0000000000..905dc3ca4a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-coshf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c new file mode 100644 index 0000000000..94b899076b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC coshf +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index b4bccd8e84..c0ac6809e8 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf) VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf) +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index 1aa2c920ed..5fb185e6fe 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf) VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf) +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index 2042aec59e..7ab40a72d0 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf) VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf) +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index bb25393c57..efd661db69 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf) VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf) +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf) #define VEC_INT_TYPE __m128i