From patchwork Wed Nov 24 19:37:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 48105 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E1DFC385781B for ; Wed, 24 Nov 2021 20:31:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E1DFC385781B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637785861; bh=kcV1anU8ChITlCfshSDnYkbZPTykliqs6R3EcTnOJtE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xZ7JAVmay+3hMrn9D6DDj3LqCgmaXJQvRIZpDMIOqWlyVbY/lfFeP5RGBhEk4xhDR eW809quPp329nGUhV0VNT57ydWCkR4aFvRWVl6n4r02paGWwl52rwu5y0juBuW76Vd Jc3/+TOHizrKqZyoO8FX8Jq6Dkntsd2grWPPsfoU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 04D003857814 for ; Wed, 24 Nov 2021 19:38:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 04D003857814 X-IronPort-AV: E=McAfee;i="6200,9189,10178"; a="235177336" X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="235177336" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2021 11:38:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="591701883" Received: from scymds02.sc.intel.com ([10.82.73.244]) by FMSMGA003.fm.intel.com with ESMTP; 24 Nov 2021 11:38:12 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1AOJc7XA021555; Wed, 24 Nov 2021 11:38:12 -0800 To: libc-alpha@sourceware.org Subject: [PATCH 23/42] x86-64: Add vector exp10/exp10f implementation to libmvec Date: Wed, 24 Nov 2021 11:37:48 -0800 Message-Id: <20211124193807.2093208-24-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211124193807.2093208-1-skpgkp2@gmail.com> References: <20211124193807.2093208-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, LOTS_OF_MONEY, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_exp102_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp102_core.c | 27 + .../fpu/multiarch/svml_d_exp102_core_sse4.S | 1664 +++++++++++++++++ .../fpu/multiarch/svml_d_exp104_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp104_core.c | 27 + .../fpu/multiarch/svml_d_exp104_core_avx2.S | 1503 +++++++++++++++ .../fpu/multiarch/svml_d_exp108_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp108_core.c | 27 + .../fpu/multiarch/svml_d_exp108_core_avx512.S | 987 ++++++++++ .../fpu/multiarch/svml_s_exp10f16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_exp10f16_core.c | 28 + .../multiarch/svml_s_exp10f16_core_avx512.S | 739 ++++++++ .../fpu/multiarch/svml_s_exp10f4_core-sse2.S | 20 + .../fpu/multiarch/svml_s_exp10f4_core.c | 28 + .../fpu/multiarch/svml_s_exp10f4_core_sse4.S | 1038 ++++++++++ .../fpu/multiarch/svml_s_exp10f8_core-sse.S | 20 + .../fpu/multiarch/svml_s_exp10f8_core.c | 28 + .../fpu/multiarch/svml_s_exp10f8_core_avx2.S | 923 +++++++++ sysdeps/x86_64/fpu/svml_d_exp102_core.S | 29 + sysdeps/x86_64/fpu/svml_d_exp104_core.S | 29 + sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_exp108_core.S | 25 + sysdeps/x86_64/fpu/svml_s_exp10f16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_exp10f4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_exp10f8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S | 25 + .../fpu/test-double-libmvec-exp10-avx.c | 1 + .../fpu/test-double-libmvec-exp10-avx2.c | 1 + .../fpu/test-double-libmvec-exp10-avx512f.c | 1 + .../x86_64/fpu/test-double-libmvec-exp10.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../fpu/test-float-libmvec-exp10f-avx.c | 1 + .../fpu/test-float-libmvec-exp10f-avx2.c | 1 + .../fpu/test-float-libmvec-exp10f-avx512f.c | 1 + .../x86_64/fpu/test-float-libmvec-exp10f.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 49 files changed, 7422 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp102_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp108_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index f7e3b27b0a..ff85f5be62 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -219,4 +219,15 @@ #define __DECL_SIMD_erfcf32x #define __DECL_SIMD_erfcf64x #define __DECL_SIMD_erfcf128x + +#define __DECL_SIMD_exp10 +#define __DECL_SIMD_exp10f +#define __DECL_SIMD_exp10l +#define __DECL_SIMD_exp10f16 +#define __DECL_SIMD_exp10f32 +#define __DECL_SIMD_exp10f64 +#define __DECL_SIMD_exp10f128 +#define __DECL_SIMD_exp10f32x +#define __DECL_SIMD_exp10f64x +#define __DECL_SIMD_exp10f128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 48bd8fcda8..6fc9d6f156 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -111,7 +111,7 @@ __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2)); #if __GLIBC_USE (IEC_60559_FUNCS_EXT_C2X) /* Compute exponent to base ten. */ -__MATHCALL (exp10,, (_Mdouble_ __x)); +__MATHCALL_VEC (exp10,, (_Mdouble_ __x)); #endif #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99 diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index 8c17e38ce6..be309ddbf0 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -56,6 +56,7 @@ GLIBC_2.35 _ZGVbN2v_cbrt F GLIBC_2.35 _ZGVbN2v_cosh F GLIBC_2.35 _ZGVbN2v_erf F GLIBC_2.35 _ZGVbN2v_erfc F +GLIBC_2.35 _ZGVbN2v_exp10 F GLIBC_2.35 _ZGVbN2vv_atan2 F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F @@ -67,6 +68,7 @@ GLIBC_2.35 _ZGVbN4v_cbrtf F GLIBC_2.35 _ZGVbN4v_coshf F GLIBC_2.35 _ZGVbN4v_erfcf F GLIBC_2.35 _ZGVbN4v_erff F +GLIBC_2.35 _ZGVbN4v_exp10f F GLIBC_2.35 _ZGVbN4vv_atan2f F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F @@ -78,6 +80,7 @@ GLIBC_2.35 _ZGVcN4v_cbrt F GLIBC_2.35 _ZGVcN4v_cosh F GLIBC_2.35 _ZGVcN4v_erf F GLIBC_2.35 _ZGVcN4v_erfc F +GLIBC_2.35 _ZGVcN4v_exp10 F GLIBC_2.35 _ZGVcN4vv_atan2 F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F @@ -89,6 +92,7 @@ GLIBC_2.35 _ZGVcN8v_cbrtf F GLIBC_2.35 _ZGVcN8v_coshf F GLIBC_2.35 _ZGVcN8v_erfcf F GLIBC_2.35 _ZGVcN8v_erff F +GLIBC_2.35 _ZGVcN8v_exp10f F GLIBC_2.35 _ZGVcN8vv_atan2f F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F @@ -100,6 +104,7 @@ GLIBC_2.35 _ZGVdN4v_cbrt F GLIBC_2.35 _ZGVdN4v_cosh F GLIBC_2.35 _ZGVdN4v_erf F GLIBC_2.35 _ZGVdN4v_erfc F +GLIBC_2.35 _ZGVdN4v_exp10 F GLIBC_2.35 _ZGVdN4vv_atan2 F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F @@ -111,6 +116,7 @@ GLIBC_2.35 _ZGVdN8v_cbrtf F GLIBC_2.35 _ZGVdN8v_coshf F GLIBC_2.35 _ZGVdN8v_erfcf F GLIBC_2.35 _ZGVdN8v_erff F +GLIBC_2.35 _ZGVdN8v_exp10f F GLIBC_2.35 _ZGVdN8vv_atan2f F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F @@ -122,6 +128,7 @@ GLIBC_2.35 _ZGVeN16v_cbrtf F GLIBC_2.35 _ZGVeN16v_coshf F GLIBC_2.35 _ZGVeN16v_erfcf F GLIBC_2.35 _ZGVeN16v_erff F +GLIBC_2.35 _ZGVeN16v_exp10f F GLIBC_2.35 _ZGVeN16vv_atan2f F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F @@ -133,4 +140,5 @@ GLIBC_2.35 _ZGVeN8v_cbrt F GLIBC_2.35 _ZGVeN8v_cosh F GLIBC_2.35 _ZGVeN8v_erf F GLIBC_2.35 _ZGVeN8v_erfc F +GLIBC_2.35 _ZGVeN8v_exp10 F GLIBC_2.35 _ZGVeN8vv_atan2 F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index be16a37989..69d05d334d 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -102,6 +102,10 @@ # define __DECL_SIMD_erfc __DECL_SIMD_x86_64 # undef __DECL_SIMD_erfcf # define __DECL_SIMD_erfcf __DECL_SIMD_x86_64 +# undef __DECL_SIMD_exp10 +# define __DECL_SIMD_exp10 __DECL_SIMD_x86_64 +# undef __DECL_SIMD_exp10f +# define __DECL_SIMD_exp10f __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index a8138c8e6b..95709ef402 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -35,6 +35,7 @@ libmvec-funcs = \ erf \ erfc \ exp \ + exp10 \ log \ pow \ sin \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index 40b801b714..f891c07550 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -24,6 +24,7 @@ libmvec { _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh; _ZGVbN2v_erf; _ZGVcN4v_erf; _ZGVdN4v_erf; _ZGVeN8v_erf; _ZGVbN2v_erfc; _ZGVcN4v_erfc; _ZGVdN4v_erfc; _ZGVeN8v_erfc; + _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10; _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; @@ -35,6 +36,7 @@ libmvec { _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf; _ZGVbN4v_erfcf; _ZGVcN8v_erfcf; _ZGVdN8v_erfcf; _ZGVeN16v_erfcf; _ZGVbN4v_erff; _ZGVcN8v_erff; _ZGVdN8v_erff; _ZGVeN16v_erff; + _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f; _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index f1b9921920..4465c0af19 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -1409,6 +1409,26 @@ float: 1 float128: 3 ldouble: 2 +Function: "exp10_vlen16": +float: 3 + +Function: "exp10_vlen2": +double: 1 + +Function: "exp10_vlen4": +double: 1 +float: 1 + +Function: "exp10_vlen4_avx2": +double: 1 + +Function: "exp10_vlen8": +double: 1 +float: 1 + +Function: "exp10_vlen8_avx2": +float: 1 + Function: "exp2": double: 1 float: 1 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S new file mode 100644 index 0000000000..ab615c0323 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized exp10, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2v_exp10 _ZGVbN2v_exp10_sse2 +#include "../svml_d_exp102_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c new file mode 100644 index 0000000000..5c5625b278 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp10, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2v_exp10 +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_exp10, __GI__ZGVbN2v_exp10, __redirect__ZGVbN2v_exp10) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S new file mode 100644 index 0000000000..2ee6a13135 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S @@ -0,0 +1,1664 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp10(x) = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y)) + * where + * x = m*log10(2)/K + y, y in [-log10(2)/K..log10(2)/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp10(x)-1 + * on small interval [-log10(2)/K..log10(2)/K] + * + * Special cases: + * + * exp10(NaN) = NaN + * exp10(+INF) = +INF + * exp10(-INF) = 0 + * exp10(x) = 1 for subnormals + * For IEEE double + * if x > 3.39782712893383973096e+02 then exp10(x) overflow + * if x < -3.45133219101941108420e+02 then exp10(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVbN2v_exp10_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm3 + +/* Load arument */ + movups 1024+__svml_dexp10_data_internal(%rip), %xmm1 + lea __svml_dexp10_data_internal(%rip), %r8 + mulpd %xmm3, %xmm1 + movups 1088+__svml_dexp10_data_internal(%rip), %xmm4 + addpd %xmm4, %xmm1 + movaps %xmm1, %xmm2 + subpd %xmm4, %xmm2 + movq 1728+__svml_dexp10_data_internal(%rip), %xmm5 + +/* Check for overflow\underflow */ + pshufd $221, %xmm3, %xmm7 + movq 1792+__svml_dexp10_data_internal(%rip), %xmm6 + pand %xmm5, %xmm7 + pcmpgtd %xmm6, %xmm7 + +/* R */ + movups 1152+__svml_dexp10_data_internal(%rip), %xmm6 + mulpd %xmm2, %xmm6 + movmskps %xmm7, %edx + movq 1664+__svml_dexp10_data_internal(%rip), %xmm0 + +/* Index and lookup */ + pshufd $136, %xmm1, %xmm4 + +/* 2^N */ + psllq $45, %xmm1 + pand %xmm0, %xmm4 + +/* iIndex*=sizeof(D); */ + pslld $3, %xmm4 + movd %xmm4, %eax + pshufd $1, %xmm4, %xmm5 + movups 1216+__svml_dexp10_data_internal(%rip), %xmm4 + mulpd %xmm2, %xmm4 + movd %xmm5, %ecx + movaps %xmm3, %xmm5 + +/* + * Polynomial + * poly(dN) = a1*dR+...+a5*dR^5 + */ + movups 1536+__svml_dexp10_data_internal(%rip), %xmm2 + subpd %xmm6, %xmm5 + subpd %xmm4, %xmm5 + mulpd %xmm5, %xmm2 + addpd 1472+__svml_dexp10_data_internal(%rip), %xmm2 + mulpd %xmm5, %xmm2 + addpd 1408+__svml_dexp10_data_internal(%rip), %xmm2 + mulpd %xmm5, %xmm2 + addpd 1344+__svml_dexp10_data_internal(%rip), %xmm2 + +/* a1+...+a5*dR^4 ! */ + mulpd %xmm5, %xmm2 + addpd 1280+__svml_dexp10_data_internal(%rip), %xmm2 + +/* a1*dR+...+a5*dR^5 */ + mulpd %xmm2, %xmm5 + movslq %eax, %rax + movslq %ecx, %rcx + +/* lM==EXP(2^N) */ + pand 1600+__svml_dexp10_data_internal(%rip), %xmm1 + movsd (%r8,%rax), %xmm0 + movhpd (%r8,%rcx), %xmm0 + +/* Tj*poly */ + mulpd %xmm0, %xmm5 + addpd %xmm5, %xmm0 + +/* quick 2^N */ + paddq %xmm1, %xmm0 + andl $3, %edx + +/* Finish */ + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm3, 192(%rsp) + movups %xmm0, 256(%rsp) + je .LBL_1_2 + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $2, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,8), %rdi + lea 256(%rsp,%r12,8), %rsi + call __svml_dexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN2v_exp10_sse4) + + .align 16,0x90 + +__svml_dexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + shrl $4, %edx + movsd (%rdi), %xmm6 + movsd %xmm6, -8(%rsp) + cmpl $2047, %edx + je .LBL_2_17 + cmpl $970, %edx + jle .LBL_2_15 + movsd 1080+__dexp10_la__imldExp10Tab(%rip), %xmm0 + comisd %xmm6, %xmm0 + jb .LBL_2_14 + comisd 1096+__dexp10_la__imldExp10Tab(%rip), %xmm6 + jb .LBL_2_13 + movsd 1024+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movaps %xmm6, %xmm5 + mulsd %xmm6, %xmm0 + lea __dexp10_la__imldExp10Tab(%rip), %r10 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, -8(%rsp) + addsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + movsd -16(%rsp), %xmm2 + movl -16(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -24(%rsp) + lea 1(%r8,%r8), %r9d + movsd -24(%rsp), %xmm3 + lea (%r8,%r8), %edi + mulsd 1104+__dexp10_la__imldExp10Tab(%rip), %xmm3 + movsd -24(%rsp), %xmm4 + subsd %xmm3, %xmm5 + mulsd 1112+__dexp10_la__imldExp10Tab(%rip), %xmm4 + movsd 1072+__dexp10_la__imldExp10Tab(%rip), %xmm2 + subsd %xmm4, %xmm5 + mulsd %xmm5, %xmm2 + shrl $6, %ecx + addsd 1064+__dexp10_la__imldExp10Tab(%rip), %xmm2 + comisd 1088+__dexp10_la__imldExp10Tab(%rip), %xmm6 + mulsd %xmm5, %xmm2 + movsd (%r10,%rdi,8), %xmm0 + lea 1023(%rcx), %edx + addsd 1056+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1048+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1040+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd (%r10,%r9,8), %xmm2 + mulsd %xmm0, %xmm2 + jb .LBL_2_9 + andl $2047, %edx + addsd %xmm0, %xmm2 + cmpl $2046, %edx + ja .LBL_2_8 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_8: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + mulsd 1152+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_9: + addl $1083, %ecx + andl $2047, %ecx + movl %ecx, %eax + movzwl -2(%rsp), %edx + shll $4, %eax + andl $-32753, %edx + orl %eax, %edx + movw %dx, -2(%rsp) + movsd -8(%rsp), %xmm1 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm1 + movaps %xmm1, %xmm0 + addsd %xmm2, %xmm0 + cmpl $50, %ecx + ja .LBL_2_11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_12 + +.LBL_2_11: + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm0 + subsd %xmm0, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -64(%rsp), %xmm1 + addsd %xmm2, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -72(%rsp), %xmm2 + mulsd 1168+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -56(%rsp) + movsd -72(%rsp), %xmm4 + movsd -56(%rsp), %xmm3 + addsd %xmm3, %xmm4 + movsd %xmm4, -48(%rsp) + movsd -48(%rsp), %xmm6 + movsd -56(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -72(%rsp), %xmm8 + movsd -40(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -32(%rsp) + movsd -64(%rsp), %xmm10 + movsd -32(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -32(%rsp), %xmm12 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm14 + movsd -32(%rsp), %xmm13 + addsd %xmm13, %xmm14 + movsd %xmm14, (%rsi) + +.LBL_2_12: + movl $4, %eax + ret + +.LBL_2_13: + movsd 1120+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_14: + movsd 1128+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_15: + movsd 1144+__dexp10_la__imldExp10Tab(%rip), %xmm0 + addsd %xmm6, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_17: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_19 + +.LBL_2_18: + mulsd %xmm6, %xmm6 + movsd %xmm6, (%rsi) + ret + +.LBL_2_19: + testl $1048575, -4(%rsp) + jne .LBL_2_18 + cmpl $0, -8(%rsp) + jne .LBL_2_18 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp10_cout_rare_internal,@function + .size __svml_dexp10_cout_rare_internal,.-__svml_dexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp10_data_internal: + .long 0 + .long 1072693248 + .long 2851812149 + .long 1072698941 + .long 1048019041 + .long 1072704666 + .long 3899555717 + .long 1072710421 + .long 3541402996 + .long 1072716208 + .long 702412510 + .long 1072722027 + .long 410360776 + .long 1072727877 + .long 3402036099 + .long 1072733758 + .long 1828292879 + .long 1072739672 + .long 728909815 + .long 1072745618 + .long 852742562 + .long 1072751596 + .long 2952712987 + .long 1072757606 + .long 3490863953 + .long 1072763649 + .long 3228316108 + .long 1072769725 + .long 2930322912 + .long 1072775834 + .long 3366293073 + .long 1072781976 + .long 1014845819 + .long 1072788152 + .long 948735466 + .long 1072794361 + .long 3949972341 + .long 1072800603 + .long 2214878420 + .long 1072806880 + .long 828946858 + .long 1072813191 + .long 586995997 + .long 1072819536 + .long 2288159958 + .long 1072825915 + .long 2440944790 + .long 1072832329 + .long 1853186616 + .long 1072838778 + .long 1337108031 + .long 1072845262 + .long 1709341917 + .long 1072851781 + .long 3790955393 + .long 1072858335 + .long 4112506593 + .long 1072864925 + .long 3504003472 + .long 1072871551 + .long 2799960843 + .long 1072878213 + .long 2839424854 + .long 1072884911 + .long 171030293 + .long 1072891646 + .long 4232894513 + .long 1072898416 + .long 2992903935 + .long 1072905224 + .long 1603444721 + .long 1072912069 + .long 926591435 + .long 1072918951 + .long 1829099622 + .long 1072925870 + .long 887463927 + .long 1072932827 + .long 3272845541 + .long 1072939821 + .long 1276261410 + .long 1072946854 + .long 78413852 + .long 1072953925 + .long 569847338 + .long 1072961034 + .long 3645941911 + .long 1072968181 + .long 1617004845 + .long 1072975368 + .long 3978100823 + .long 1072982593 + .long 3049340112 + .long 1072989858 + .long 4040676318 + .long 1072997162 + .long 3577096743 + .long 1073004506 + .long 2583551245 + .long 1073011890 + .long 1990012071 + .long 1073019314 + .long 2731501122 + .long 1073026778 + .long 1453150082 + .long 1073034283 + .long 3395129871 + .long 1073041828 + .long 917841882 + .long 1073049415 + .long 3566716925 + .long 1073057042 + .long 3712504873 + .long 1073064711 + .long 2321106615 + .long 1073072422 + .long 363667784 + .long 1073080175 + .long 3111574537 + .long 1073087969 + .long 2956612997 + .long 1073095806 + .long 885834528 + .long 1073103686 + .long 2186617381 + .long 1073111608 + .long 3561793907 + .long 1073119573 + .long 1719614413 + .long 1073127582 + .long 1963711167 + .long 1073135634 + .long 1013258799 + .long 1073143730 + .long 4182873220 + .long 1073151869 + .long 3907805044 + .long 1073160053 + .long 1218806132 + .long 1073168282 + .long 1447192521 + .long 1073176555 + .long 1339972927 + .long 1073184873 + .long 1944781191 + .long 1073193236 + .long 19972402 + .long 1073201645 + .long 919555682 + .long 1073210099 + .long 1413356050 + .long 1073218599 + .long 2571947539 + .long 1073227145 + .long 1176749997 + .long 1073235738 + .long 2604962541 + .long 1073244377 + .long 3649726105 + .long 1073253063 + .long 1110089947 + .long 1073261797 + .long 380978316 + .long 1073270578 + .long 2568320822 + .long 1073279406 + .long 194117574 + .long 1073288283 + .long 2966275557 + .long 1073297207 + .long 3418903055 + .long 1073306180 + .long 2682146384 + .long 1073315202 + .long 1892288442 + .long 1073324273 + .long 2191782032 + .long 1073333393 + .long 434316067 + .long 1073342563 + .long 2069751141 + .long 1073351782 + .long 3964284211 + .long 1073361051 + .long 2990417245 + .long 1073370371 + .long 321958744 + .long 1073379742 + .long 1434058175 + .long 1073389163 + .long 3218338682 + .long 1073398635 + .long 2572866477 + .long 1073408159 + .long 697153126 + .long 1073417735 + .long 3092190715 + .long 1073427362 + .long 2380618042 + .long 1073437042 + .long 4076559943 + .long 1073446774 + .long 815859274 + .long 1073456560 + .long 2420883922 + .long 1073466398 + .long 1540824585 + .long 1073476290 + .long 3716502172 + .long 1073486235 + .long 1610600570 + .long 1073496235 + .long 777507147 + .long 1073506289 + .long 2483480501 + .long 1073516397 + .long 3706687593 + .long 1073526560 + .long 1432208378 + .long 1073536779 + .long 1242007932 + .long 1073547053 + .long 135105010 + .long 1073557383 + .long 3707479175 + .long 1073567768 + .long 382305176 + .long 1073578211 + .long 64696965 + .long 1073588710 + .long 4076975200 + .long 1073599265 + .long 863738719 + .long 1073609879 + .long 351641897 + .long 1073620550 + .long 3884662774 + .long 1073631278 + .long 4224142467 + .long 1073642065 + .long 2728693978 + .long 1073652911 + .long 764307441 + .long 1073663816 + .long 3999357479 + .long 1073674779 + .long 929806999 + .long 1073685803 + .long 1533953344 + .long 1073696886 + .long 2912730644 + .long 1073708029 + .long 2174652632 + .long 1073719233 + .long 730821105 + .long 1073730498 + .long 158966641 + .long 1081774927 + .long 158966641 + .long 1081774927 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1127776256 + .long 0 + .long 1127776256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1352663040 + .long 1063470099 + .long 1352663040 + .long 1063470099 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2646727321 + .long 3171991585 + .long 2646727321 + .long 3171991585 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3149223190 + .long 1073900465 + .long 3149223190 + .long 1073900465 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3342657763 + .long 1074083108 + .long 3342657763 + .long 1074083108 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2446125707 + .long 1073760005 + .long 2446125707 + .long 1073760005 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1971641177 + .long 1072872822 + .long 1971641177 + .long 1072872822 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056635732 + .long 1071727808 + .long 1056635732 + .long 1071727808 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 127 + .long 127 + .long 127 + .long 127 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dexp10_data_internal,@object + .size __svml_dexp10_data_internal,1856 + .align 32 + +__dexp10_la__imldExp10Tab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 158966641 + .long 1080726351 + .long 0 + .long 1127743488 + .long 3149223189 + .long 1073900465 + .long 3342651914 + .long 1074083108 + .long 2447555600 + .long 1073760005 + .long 3091263679 + .long 1072872823 + .long 2277968517 + .long 1071727228 + .long 1352628734 + .long 1081295891 + .long 1190603329 + .long 3228777073 + .long 1189309266 + .long 3228842423 + .long 1352597504 + .long 1064518675 + .long 3296479949 + .long 1025408956 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 1009778688 + .long 0 + .long 1106771968 + .type __dexp10_la__imldExp10Tab,@object + .size __dexp10_la__imldExp10Tab,1176 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S new file mode 100644 index 0000000000..260c052143 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized exp10, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4v_exp10 _ZGVdN4v_exp10_sse_wrapper +#include "../svml_d_exp104_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c new file mode 100644 index 0000000000..e3e302be72 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp10, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4v_exp10 +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_exp10, __GI__ZGVdN4v_exp10, __redirect__ZGVdN4v_exp10) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S new file mode 100644 index 0000000000..8c4e582775 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S @@ -0,0 +1,1503 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp10(x) = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y)) + * where + * x = m*log10(2)/K + y, y in [-log10(2)/K..log10(2)/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp10(x)-1 + * on small interval [-log10(2)/K..log10(2)/K] + * + * Special cases: + * + * exp10(NaN) = NaN + * exp10(+INF) = +INF + * exp10(-INF) = 0 + * exp10(x) = 1 for subnormals + * For IEEE double + * if x > 3.39782712893383973096e+02 then exp10(x) overflow + * if x < -3.45133219101941108420e+02 then exp10(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVdN4v_exp10_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + lea __svml_dexp10_data_internal(%rip), %rax + vmovapd %ymm0, %ymm2 + vmovupd 1088+__svml_dexp10_data_internal(%rip), %ymm3 + +/* Load arument */ + vmovupd 1024+__svml_dexp10_data_internal(%rip), %ymm0 + vmovups %ymm12, 224(%rsp) + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + +/* R */ + vmovupd 1152+__svml_dexp10_data_internal(%rip), %ymm12 + vmovups %ymm13, 256(%rsp) + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + +/* + * Polynomial + * poly(dN) = a1*dR+...+a5*dR^5 + */ + vmovupd 1536+__svml_dexp10_data_internal(%rip), %ymm13 + vfmadd213pd %ymm3, %ymm2, %ymm0 + vmovups %ymm8, 32(%rsp) + vmovups %ymm9, 96(%rsp) + vmovups %ymm11, 192(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm15, 320(%rsp) + vsubpd %ymm3, %ymm0, %ymm1 + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vfnmadd213pd %ymm2, %ymm1, %ymm12 + +/* Check for overflow\underflow */ + vextractf128 $1, %ymm2, %xmm4 + vfnmadd132pd 1216+__svml_dexp10_data_internal(%rip), %ymm12, %ymm1 + vshufps $221, %xmm4, %xmm2, %xmm5 + vandps 1728+__svml_dexp10_data_internal(%rip), %xmm5, %xmm6 + vpcmpgtd 1792+__svml_dexp10_data_internal(%rip), %xmm6, %xmm7 + vmovmskps %xmm7, %edx + vfmadd213pd 1472+__svml_dexp10_data_internal(%rip), %ymm1, %ymm13 + vfmadd213pd 1408+__svml_dexp10_data_internal(%rip), %ymm1, %ymm13 + vfmadd213pd 1344+__svml_dexp10_data_internal(%rip), %ymm1, %ymm13 + +/* a1+...+a5*dR^4 ! */ + vfmadd213pd 1280+__svml_dexp10_data_internal(%rip), %ymm1, %ymm13 + +/* a1*dR+...+a5*dR^5 */ + vmulpd %ymm13, %ymm1, %ymm1 + +/* Index and lookup */ + vextractf128 $1, %ymm0, %xmm3 + +/* 2^N */ + vpsllq $45, %ymm0, %ymm11 + vshufps $136, %xmm3, %xmm0, %xmm4 + vandps 1664+__svml_dexp10_data_internal(%rip), %xmm4, %xmm5 + +/* iIndex*=sizeof(D); */ + vpslld $3, %xmm5, %xmm8 + vmovd %xmm8, %ecx + +/* lM==EXP(2^N) */ + vpand 1600+__svml_dexp10_data_internal(%rip), %ymm11, %ymm15 + vpextrd $2, %xmm8, %r9d + movslq %ecx, %rcx + vpextrd $1, %xmm8, %r8d + movslq %r9d, %r9 + vpextrd $3, %xmm8, %r10d + movslq %r8d, %r8 + movslq %r10d, %r10 + vmovsd (%rax,%rcx), %xmm6 + vmovsd (%rax,%r9), %xmm9 + vmovhpd (%rax,%r8), %xmm6, %xmm7 + vmovhpd (%rax,%r10), %xmm9, %xmm10 + vinsertf128 $1, %xmm10, %ymm7, %ymm14 + +/* Tj*poly */ + vfmadd213pd %ymm14, %ymm14, %ymm1 + +/* quick 2^N */ + vpaddq %ymm15, %ymm1, %ymm0 + +/* Finish */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovupd %ymm2, 64(%rsp) + vmovupd %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovupd 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN4v_exp10_avx2) + + .align 16,0x90 + +__svml_dexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + shrl $4, %edx + movsd (%rdi), %xmm6 + movsd %xmm6, -8(%rsp) + cmpl $2047, %edx + je .LBL_2_17 + cmpl $970, %edx + jle .LBL_2_15 + movsd 1080+__dexp10_la__imldExp10Tab(%rip), %xmm0 + comisd %xmm6, %xmm0 + jb .LBL_2_14 + comisd 1096+__dexp10_la__imldExp10Tab(%rip), %xmm6 + jb .LBL_2_13 + movsd 1024+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movaps %xmm6, %xmm5 + mulsd %xmm6, %xmm0 + lea __dexp10_la__imldExp10Tab(%rip), %r10 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, -8(%rsp) + addsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + movsd -16(%rsp), %xmm2 + movl -16(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -24(%rsp) + lea 1(%r8,%r8), %r9d + movsd -24(%rsp), %xmm3 + lea (%r8,%r8), %edi + mulsd 1104+__dexp10_la__imldExp10Tab(%rip), %xmm3 + movsd -24(%rsp), %xmm4 + subsd %xmm3, %xmm5 + mulsd 1112+__dexp10_la__imldExp10Tab(%rip), %xmm4 + movsd 1072+__dexp10_la__imldExp10Tab(%rip), %xmm2 + subsd %xmm4, %xmm5 + mulsd %xmm5, %xmm2 + shrl $6, %ecx + addsd 1064+__dexp10_la__imldExp10Tab(%rip), %xmm2 + comisd 1088+__dexp10_la__imldExp10Tab(%rip), %xmm6 + mulsd %xmm5, %xmm2 + movsd (%r10,%rdi,8), %xmm0 + lea 1023(%rcx), %edx + addsd 1056+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1048+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1040+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd (%r10,%r9,8), %xmm2 + mulsd %xmm0, %xmm2 + jb .LBL_2_9 + andl $2047, %edx + addsd %xmm0, %xmm2 + cmpl $2046, %edx + ja .LBL_2_8 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_8: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + mulsd 1152+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_9: + addl $1083, %ecx + andl $2047, %ecx + movl %ecx, %eax + movzwl -2(%rsp), %edx + shll $4, %eax + andl $-32753, %edx + orl %eax, %edx + movw %dx, -2(%rsp) + movsd -8(%rsp), %xmm1 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm1 + movaps %xmm1, %xmm0 + addsd %xmm2, %xmm0 + cmpl $50, %ecx + ja .LBL_2_11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_12 + +.LBL_2_11: + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm0 + subsd %xmm0, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -64(%rsp), %xmm1 + addsd %xmm2, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -72(%rsp), %xmm2 + mulsd 1168+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -56(%rsp) + movsd -72(%rsp), %xmm4 + movsd -56(%rsp), %xmm3 + addsd %xmm3, %xmm4 + movsd %xmm4, -48(%rsp) + movsd -48(%rsp), %xmm6 + movsd -56(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -72(%rsp), %xmm8 + movsd -40(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -32(%rsp) + movsd -64(%rsp), %xmm10 + movsd -32(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -32(%rsp), %xmm12 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm14 + movsd -32(%rsp), %xmm13 + addsd %xmm13, %xmm14 + movsd %xmm14, (%rsi) + +.LBL_2_12: + movl $4, %eax + ret + +.LBL_2_13: + movsd 1120+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_14: + movsd 1128+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_15: + movsd 1144+__dexp10_la__imldExp10Tab(%rip), %xmm0 + addsd %xmm6, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_17: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_19 + +.LBL_2_18: + mulsd %xmm6, %xmm6 + movsd %xmm6, (%rsi) + ret + +.LBL_2_19: + testl $1048575, -4(%rsp) + jne .LBL_2_18 + cmpl $0, -8(%rsp) + jne .LBL_2_18 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp10_cout_rare_internal,@function + .size __svml_dexp10_cout_rare_internal,.-__svml_dexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp10_data_internal: + .long 0 + .long 1072693248 + .long 2851812149 + .long 1072698941 + .long 1048019041 + .long 1072704666 + .long 3899555717 + .long 1072710421 + .long 3541402996 + .long 1072716208 + .long 702412510 + .long 1072722027 + .long 410360776 + .long 1072727877 + .long 3402036099 + .long 1072733758 + .long 1828292879 + .long 1072739672 + .long 728909815 + .long 1072745618 + .long 852742562 + .long 1072751596 + .long 2952712987 + .long 1072757606 + .long 3490863953 + .long 1072763649 + .long 3228316108 + .long 1072769725 + .long 2930322912 + .long 1072775834 + .long 3366293073 + .long 1072781976 + .long 1014845819 + .long 1072788152 + .long 948735466 + .long 1072794361 + .long 3949972341 + .long 1072800603 + .long 2214878420 + .long 1072806880 + .long 828946858 + .long 1072813191 + .long 586995997 + .long 1072819536 + .long 2288159958 + .long 1072825915 + .long 2440944790 + .long 1072832329 + .long 1853186616 + .long 1072838778 + .long 1337108031 + .long 1072845262 + .long 1709341917 + .long 1072851781 + .long 3790955393 + .long 1072858335 + .long 4112506593 + .long 1072864925 + .long 3504003472 + .long 1072871551 + .long 2799960843 + .long 1072878213 + .long 2839424854 + .long 1072884911 + .long 171030293 + .long 1072891646 + .long 4232894513 + .long 1072898416 + .long 2992903935 + .long 1072905224 + .long 1603444721 + .long 1072912069 + .long 926591435 + .long 1072918951 + .long 1829099622 + .long 1072925870 + .long 887463927 + .long 1072932827 + .long 3272845541 + .long 1072939821 + .long 1276261410 + .long 1072946854 + .long 78413852 + .long 1072953925 + .long 569847338 + .long 1072961034 + .long 3645941911 + .long 1072968181 + .long 1617004845 + .long 1072975368 + .long 3978100823 + .long 1072982593 + .long 3049340112 + .long 1072989858 + .long 4040676318 + .long 1072997162 + .long 3577096743 + .long 1073004506 + .long 2583551245 + .long 1073011890 + .long 1990012071 + .long 1073019314 + .long 2731501122 + .long 1073026778 + .long 1453150082 + .long 1073034283 + .long 3395129871 + .long 1073041828 + .long 917841882 + .long 1073049415 + .long 3566716925 + .long 1073057042 + .long 3712504873 + .long 1073064711 + .long 2321106615 + .long 1073072422 + .long 363667784 + .long 1073080175 + .long 3111574537 + .long 1073087969 + .long 2956612997 + .long 1073095806 + .long 885834528 + .long 1073103686 + .long 2186617381 + .long 1073111608 + .long 3561793907 + .long 1073119573 + .long 1719614413 + .long 1073127582 + .long 1963711167 + .long 1073135634 + .long 1013258799 + .long 1073143730 + .long 4182873220 + .long 1073151869 + .long 3907805044 + .long 1073160053 + .long 1218806132 + .long 1073168282 + .long 1447192521 + .long 1073176555 + .long 1339972927 + .long 1073184873 + .long 1944781191 + .long 1073193236 + .long 19972402 + .long 1073201645 + .long 919555682 + .long 1073210099 + .long 1413356050 + .long 1073218599 + .long 2571947539 + .long 1073227145 + .long 1176749997 + .long 1073235738 + .long 2604962541 + .long 1073244377 + .long 3649726105 + .long 1073253063 + .long 1110089947 + .long 1073261797 + .long 380978316 + .long 1073270578 + .long 2568320822 + .long 1073279406 + .long 194117574 + .long 1073288283 + .long 2966275557 + .long 1073297207 + .long 3418903055 + .long 1073306180 + .long 2682146384 + .long 1073315202 + .long 1892288442 + .long 1073324273 + .long 2191782032 + .long 1073333393 + .long 434316067 + .long 1073342563 + .long 2069751141 + .long 1073351782 + .long 3964284211 + .long 1073361051 + .long 2990417245 + .long 1073370371 + .long 321958744 + .long 1073379742 + .long 1434058175 + .long 1073389163 + .long 3218338682 + .long 1073398635 + .long 2572866477 + .long 1073408159 + .long 697153126 + .long 1073417735 + .long 3092190715 + .long 1073427362 + .long 2380618042 + .long 1073437042 + .long 4076559943 + .long 1073446774 + .long 815859274 + .long 1073456560 + .long 2420883922 + .long 1073466398 + .long 1540824585 + .long 1073476290 + .long 3716502172 + .long 1073486235 + .long 1610600570 + .long 1073496235 + .long 777507147 + .long 1073506289 + .long 2483480501 + .long 1073516397 + .long 3706687593 + .long 1073526560 + .long 1432208378 + .long 1073536779 + .long 1242007932 + .long 1073547053 + .long 135105010 + .long 1073557383 + .long 3707479175 + .long 1073567768 + .long 382305176 + .long 1073578211 + .long 64696965 + .long 1073588710 + .long 4076975200 + .long 1073599265 + .long 863738719 + .long 1073609879 + .long 351641897 + .long 1073620550 + .long 3884662774 + .long 1073631278 + .long 4224142467 + .long 1073642065 + .long 2728693978 + .long 1073652911 + .long 764307441 + .long 1073663816 + .long 3999357479 + .long 1073674779 + .long 929806999 + .long 1073685803 + .long 1533953344 + .long 1073696886 + .long 2912730644 + .long 1073708029 + .long 2174652632 + .long 1073719233 + .long 730821105 + .long 1073730498 + .long 158966641 + .long 1081774927 + .long 158966641 + .long 1081774927 + .long 158966641 + .long 1081774927 + .long 158966641 + .long 1081774927 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1127776256 + .long 0 + .long 1127776256 + .long 0 + .long 1127776256 + .long 0 + .long 1127776256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1352663040 + .long 1063470099 + .long 1352663040 + .long 1063470099 + .long 1352663040 + .long 1063470099 + .long 1352663040 + .long 1063470099 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2646727321 + .long 3171991585 + .long 2646727321 + .long 3171991585 + .long 2646727321 + .long 3171991585 + .long 2646727321 + .long 3171991585 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3149223190 + .long 1073900465 + .long 3149223190 + .long 1073900465 + .long 3149223190 + .long 1073900465 + .long 3149223190 + .long 1073900465 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3342657763 + .long 1074083108 + .long 3342657763 + .long 1074083108 + .long 3342657763 + .long 1074083108 + .long 3342657763 + .long 1074083108 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2446125707 + .long 1073760005 + .long 2446125707 + .long 1073760005 + .long 2446125707 + .long 1073760005 + .long 2446125707 + .long 1073760005 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1971641177 + .long 1072872822 + .long 1971641177 + .long 1072872822 + .long 1971641177 + .long 1072872822 + .long 1971641177 + .long 1072872822 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056635732 + .long 1071727808 + .long 1056635732 + .long 1071727808 + .long 1056635732 + .long 1071727808 + .long 1056635732 + .long 1071727808 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 127 + .long 127 + .long 127 + .long 127 + .long 127 + .long 127 + .long 127 + .long 127 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .long 1081293424 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dexp10_data_internal,@object + .size __svml_dexp10_data_internal,1856 + .align 32 + +__dexp10_la__imldExp10Tab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 158966641 + .long 1080726351 + .long 0 + .long 1127743488 + .long 3149223189 + .long 1073900465 + .long 3342651914 + .long 1074083108 + .long 2447555600 + .long 1073760005 + .long 3091263679 + .long 1072872823 + .long 2277968517 + .long 1071727228 + .long 1352628734 + .long 1081295891 + .long 1190603329 + .long 3228777073 + .long 1189309266 + .long 3228842423 + .long 1352597504 + .long 1064518675 + .long 3296479949 + .long 1025408956 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 1009778688 + .long 0 + .long 1106771968 + .type __dexp10_la__imldExp10Tab,@object + .size __dexp10_la__imldExp10Tab,1176 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S new file mode 100644 index 0000000000..3aff9446d3 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized exp10, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8v_exp10 _ZGVeN8v_exp10_avx2_wrapper +#include "../svml_d_exp108_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c new file mode 100644 index 0000000000..d592663169 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp10, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8v_exp10 +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_exp10, __GI__ZGVeN8v_exp10, __redirect__ZGVeN8v_exp10) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S new file mode 100644 index 0000000000..5a693e3cdf --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S @@ -0,0 +1,987 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * Typical exp10() implementation, except that: + * - tables are small (16 elements), allowing for fast gathers + * - all arguments processed in the main path + * - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses) + * - a VAND is used to ensure the reduced argument |R|<2, even for large inputs + * - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling + * - SAE used to avoid spurious flag settings + * + */ + +#include + + .text +ENTRY(_ZGVeN8v_exp10_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 256+__svml_dexp10_data_internal_avx512(%rip), %zmm4 + vmovups 320+__svml_dexp10_data_internal_avx512(%rip), %zmm2 + vmovups 384+__svml_dexp10_data_internal_avx512(%rip), %zmm5 + vmovups 448+__svml_dexp10_data_internal_avx512(%rip), %zmm3 + +/* polynomial */ + vmovups 576+__svml_dexp10_data_internal_avx512(%rip), %zmm6 + vmovups 704+__svml_dexp10_data_internal_avx512(%rip), %zmm7 + vmovups 768+__svml_dexp10_data_internal_avx512(%rip), %zmm9 + vmovups 832+__svml_dexp10_data_internal_avx512(%rip), %zmm8 + vmovups 896+__svml_dexp10_data_internal_avx512(%rip), %zmm11 + vmovups 1024+__svml_dexp10_data_internal_avx512(%rip), %zmm14 + vmovaps %zmm0, %zmm1 + +/* 2^(52-4)*1.5 + x * log2(e) */ + vfmadd213pd {rz-sae}, %zmm2, %zmm1, %zmm4 + vandpd 960+__svml_dexp10_data_internal_avx512(%rip), %zmm1, %zmm13 + +/* Z0 ~ x*log2(e), rounded down to 4 fractional bits */ + vsubpd {rn-sae}, %zmm2, %zmm4, %zmm0 + +/* Table lookup: Th */ + vmovups __svml_dexp10_data_internal_avx512(%rip), %zmm2 + vcmppd $29, {sae}, %zmm14, %zmm13, %k0 + +/* R = x - Z0*log(2) */ + vfnmadd213pd {rn-sae}, %zmm1, %zmm0, %zmm5 + vpermt2pd 64+__svml_dexp10_data_internal_avx512(%rip), %zmm4, %zmm2 + kmovw %k0, %edx + vfnmadd231pd {rn-sae}, %zmm0, %zmm3, %zmm5 + vmovups 640+__svml_dexp10_data_internal_avx512(%rip), %zmm3 + +/* ensure |R|<2 even for special cases */ + vandpd 512+__svml_dexp10_data_internal_avx512(%rip), %zmm5, %zmm12 + vmulpd {rn-sae}, %zmm12, %zmm12, %zmm10 + vmulpd {rn-sae}, %zmm12, %zmm2, %zmm15 + vfmadd231pd {rn-sae}, %zmm12, %zmm6, %zmm3 + vfmadd231pd {rn-sae}, %zmm12, %zmm7, %zmm9 + vfmadd231pd {rn-sae}, %zmm12, %zmm8, %zmm11 + vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm3 + vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm3 + vfmadd213pd {rn-sae}, %zmm2, %zmm15, %zmm3 + vscalefpd {rn-sae}, %zmm0, %zmm3, %zmm0 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm1, 64(%rsp) + vmovups %zmm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm0 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN8v_exp10_skx) + + .align 16,0x90 + +__svml_dexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + shrl $4, %edx + movsd (%rdi), %xmm6 + movsd %xmm6, -8(%rsp) + cmpl $2047, %edx + je .LBL_2_17 + cmpl $970, %edx + jle .LBL_2_15 + movsd 1080+__dexp10_la__imldExp10Tab(%rip), %xmm0 + comisd %xmm6, %xmm0 + jb .LBL_2_14 + comisd 1096+__dexp10_la__imldExp10Tab(%rip), %xmm6 + jb .LBL_2_13 + movsd 1024+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movaps %xmm6, %xmm5 + mulsd %xmm6, %xmm0 + lea __dexp10_la__imldExp10Tab(%rip), %r10 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, -8(%rsp) + addsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + movsd -16(%rsp), %xmm2 + movl -16(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1032+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -24(%rsp) + lea 1(%r8,%r8), %r9d + movsd -24(%rsp), %xmm3 + lea (%r8,%r8), %edi + mulsd 1104+__dexp10_la__imldExp10Tab(%rip), %xmm3 + movsd -24(%rsp), %xmm4 + subsd %xmm3, %xmm5 + mulsd 1112+__dexp10_la__imldExp10Tab(%rip), %xmm4 + movsd 1072+__dexp10_la__imldExp10Tab(%rip), %xmm2 + subsd %xmm4, %xmm5 + mulsd %xmm5, %xmm2 + shrl $6, %ecx + addsd 1064+__dexp10_la__imldExp10Tab(%rip), %xmm2 + comisd 1088+__dexp10_la__imldExp10Tab(%rip), %xmm6 + mulsd %xmm5, %xmm2 + movsd (%r10,%rdi,8), %xmm0 + lea 1023(%rcx), %edx + addsd 1056+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1048+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd 1040+__dexp10_la__imldExp10Tab(%rip), %xmm2 + mulsd %xmm5, %xmm2 + addsd (%r10,%r9,8), %xmm2 + mulsd %xmm0, %xmm2 + jb .LBL_2_9 + andl $2047, %edx + addsd %xmm0, %xmm2 + cmpl $2046, %edx + ja .LBL_2_8 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_8: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm0, %xmm2 + mulsd 1152+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, (%rsi) + ret + +.LBL_2_9: + addl $1083, %ecx + andl $2047, %ecx + movl %ecx, %eax + movzwl -2(%rsp), %edx + shll $4, %eax + andl $-32753, %edx + orl %eax, %edx + movw %dx, -2(%rsp) + movsd -8(%rsp), %xmm1 + mulsd %xmm1, %xmm2 + mulsd %xmm0, %xmm1 + movaps %xmm1, %xmm0 + addsd %xmm2, %xmm0 + cmpl $50, %ecx + ja .LBL_2_11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movsd %xmm0, (%rsi) + jmp .LBL_2_12 + +.LBL_2_11: + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm0 + subsd %xmm0, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -64(%rsp), %xmm1 + addsd %xmm2, %xmm1 + movsd %xmm1, -64(%rsp) + movsd -72(%rsp), %xmm2 + mulsd 1168+__dexp10_la__imldExp10Tab(%rip), %xmm2 + movsd %xmm2, -56(%rsp) + movsd -72(%rsp), %xmm4 + movsd -56(%rsp), %xmm3 + addsd %xmm3, %xmm4 + movsd %xmm4, -48(%rsp) + movsd -48(%rsp), %xmm6 + movsd -56(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -40(%rsp) + movsd -72(%rsp), %xmm8 + movsd -40(%rsp), %xmm7 + subsd %xmm7, %xmm8 + movsd %xmm8, -32(%rsp) + movsd -64(%rsp), %xmm10 + movsd -32(%rsp), %xmm9 + addsd %xmm9, %xmm10 + movsd %xmm10, -32(%rsp) + movsd -40(%rsp), %xmm11 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm11 + movsd %xmm11, -40(%rsp) + movsd -32(%rsp), %xmm12 + mulsd 1160+__dexp10_la__imldExp10Tab(%rip), %xmm12 + movsd %xmm12, -32(%rsp) + movsd -40(%rsp), %xmm14 + movsd -32(%rsp), %xmm13 + addsd %xmm13, %xmm14 + movsd %xmm14, (%rsi) + +.LBL_2_12: + movl $4, %eax + ret + +.LBL_2_13: + movsd 1120+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_14: + movsd 1128+__dexp10_la__imldExp10Tab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_15: + movsd 1144+__dexp10_la__imldExp10Tab(%rip), %xmm0 + addsd %xmm6, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_17: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_19 + +.LBL_2_18: + mulsd %xmm6, %xmm6 + movsd %xmm6, (%rsi) + ret + +.LBL_2_19: + testl $1048575, -4(%rsp) + jne .LBL_2_18 + cmpl $0, -8(%rsp) + jne .LBL_2_18 + movq 1136+__dexp10_la__imldExp10Tab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp10_cout_rare_internal,@function + .size __svml_dexp10_cout_rare_internal,.-__svml_dexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp10_data_internal_avx512: + .long 0 + .long 1072693248 + .long 1828292879 + .long 1072739672 + .long 1014845819 + .long 1072788152 + .long 1853186616 + .long 1072838778 + .long 171030293 + .long 1072891646 + .long 1276261410 + .long 1072946854 + .long 3577096743 + .long 1073004506 + .long 3712504873 + .long 1073064711 + .long 1719614413 + .long 1073127582 + .long 1944781191 + .long 1073193236 + .long 1110089947 + .long 1073261797 + .long 2191782032 + .long 1073333393 + .long 2572866477 + .long 1073408159 + .long 3716502172 + .long 1073486235 + .long 3707479175 + .long 1073567768 + .long 2728693978 + .long 1073652911 + .long 0 + .long 0 + .long 1568897901 + .long 1016568486 + .long 3936719688 + .long 3162512149 + .long 3819481236 + .long 1016499965 + .long 1303423926 + .long 1015238005 + .long 2804567149 + .long 1015390024 + .long 3145379760 + .long 1014403278 + .long 3793507337 + .long 1016095713 + .long 3210617384 + .long 3163796463 + .long 3108873501 + .long 3162190556 + .long 3253791412 + .long 1015920431 + .long 730975783 + .long 1014083580 + .long 2462790535 + .long 1015814775 + .long 816778419 + .long 1014197934 + .long 2789017511 + .long 1014276997 + .long 2413007344 + .long 3163551506 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 158966641 + .long 1074434895 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 16368 + .long 1123549184 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 1352628735 + .long 1070810131 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 2845113633 + .long 3158957085 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 4294967295 + .long 3221225471 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3635159072 + .long 1070273406 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 3796866084 + .long 1071727016 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2693950621 + .long 1072872823 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 2424767688 + .long 1073760005 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3342728741 + .long 1074083108 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 3149223106 + .long 1073900465 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 1190603329 + .long 1081293425 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 0 + .long 1005584384 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 120 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .long 32752 + .long 0 + .type __svml_dexp10_data_internal_avx512,@object + .size __svml_dexp10_data_internal_avx512,1280 + .align 32 + +__dexp10_la__imldExp10Tab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 158966641 + .long 1080726351 + .long 0 + .long 1127743488 + .long 3149223189 + .long 1073900465 + .long 3342651914 + .long 1074083108 + .long 2447555600 + .long 1073760005 + .long 3091263679 + .long 1072872823 + .long 2277968517 + .long 1071727228 + .long 1352628734 + .long 1081295891 + .long 1190603329 + .long 3228777073 + .long 1189309266 + .long 3228842423 + .long 1352597504 + .long 1064518675 + .long 3296479949 + .long 1025408956 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 1009778688 + .long 0 + .long 1106771968 + .type __dexp10_la__imldExp10Tab,@object + .size __dexp10_la__imldExp10Tab,1176 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S new file mode 100644 index 0000000000..dda41c9c8f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized exp10f. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16v_exp10f _ZGVeN16v_exp10f_avx2_wrapper +#include "../svml_s_exp10f16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c new file mode 100644 index 0000000000..8176a5912b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp10f, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16v_exp10f +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_exp10f, __GI__ZGVeN16v_exp10f, + __redirect__ZGVeN16v_exp10f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S new file mode 100644 index 0000000000..9b544f49b5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S @@ -0,0 +1,739 @@ +/* Function expf16 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * Typical exp10() implementation, except that: + * - tables are small (16 elements), allowing for fast gathers + * - all arguments processed in the main path + * - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses) + * - a VAND is used to ensure the reduced argument |R|<2, even for large inputs + * - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling + * - SAE used to avoid spurious flag settings + * + */ + +#include + + .text +ENTRY(_ZGVeN16v_exp10f_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 256+__svml_sexp10_data_internal_avx512(%rip), %zmm2 + vmovups 320+__svml_sexp10_data_internal_avx512(%rip), %zmm1 + vmovups 384+__svml_sexp10_data_internal_avx512(%rip), %zmm5 + vmovups 448+__svml_sexp10_data_internal_avx512(%rip), %zmm4 + +/* ensure |R|<2 even for special cases */ + vmovups 512+__svml_sexp10_data_internal_avx512(%rip), %zmm6 + vmovups 1152+__svml_sexp10_data_internal_avx512(%rip), %zmm9 + +/* 2^(52-4)*1.5 + x * log2(e) */ + vfmadd213ps {rz-sae}, %zmm1, %zmm0, %zmm2 + vmovups 1216+__svml_sexp10_data_internal_avx512(%rip), %zmm10 + vmovups __svml_sexp10_data_internal_avx512(%rip), %zmm8 + vmovups 128+__svml_sexp10_data_internal_avx512(%rip), %zmm15 + vmovups 640+__svml_sexp10_data_internal_avx512(%rip), %zmm13 + vpsrld $5, %zmm2, %zmm3 + +/* Z0 ~ x*log2(e), rounded down to 6 fractional bits */ + vsubps {rn-sae}, %zmm1, %zmm2, %zmm1 + vpermt2ps 64+__svml_sexp10_data_internal_avx512(%rip), %zmm2, %zmm8 + vpermt2ps 192+__svml_sexp10_data_internal_avx512(%rip), %zmm3, %zmm15 + vandps 576+__svml_sexp10_data_internal_avx512(%rip), %zmm0, %zmm12 + +/* R = x - Z0*log(2) */ + vfnmadd213ps {rn-sae}, %zmm0, %zmm1, %zmm5 + vcmpps $29, {sae}, %zmm13, %zmm12, %k0 + vfnmadd231ps {rn-sae}, %zmm1, %zmm4, %zmm5 + kmovw %k0, %edx + vrangeps $2, {sae}, %zmm6, %zmm5, %zmm11 + vfmadd231ps {rn-sae}, %zmm11, %zmm9, %zmm10 + vmulps {rn-sae}, %zmm11, %zmm10, %zmm14 + +/* x!=0? */ + vpxord %zmm7, %zmm7, %zmm7 + vcmpps $4, {sae}, %zmm7, %zmm0, %k1 + +/* Th*Tl */ + vmulps {rn-sae}, %zmm8, %zmm15, %zmm15{%k1} + vfmadd213ps {rn-sae}, %zmm15, %zmm14, %zmm15 + vscalefps {rn-sae}, %zmm1, %zmm15, %zmm1 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm1, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm1, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $16, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm1 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_sexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN16v_exp10f_skx) + + .align 16,0x90 + +__svml_sexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + movss (%rdi), %xmm8 + movss .FLT_55(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm8, %xmm0 + jb .LBL_2_10 + comiss .FLT_56(%rip), %xmm8 + jb .LBL_2_9 + movss .FLT_58(%rip), %xmm0 + movaps %xmm8, %xmm6 + mulss %xmm8, %xmm0 + comiss .FLT_68(%rip), %xmm8 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_59(%rip), %xmm2 + movss .FLT_67(%rip), %xmm7 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %ecx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + mulss .FLT_60(%rip), %xmm4 + movss -24(%rsp), %xmm5 + subss %xmm4, %xmm6 + mulss .FLT_61(%rip), %xmm5 + subss %xmm5, %xmm6 + mulss %xmm6, %xmm7 + addss .FLT_66(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_65(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_64(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_63(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_62(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_72(%rip), %xmm7 + movss %xmm7, -16(%rsp) + jb .LBL_2_8 + lea 127(%rcx), %edx + cmpl $254, %edx + ja .LBL_2_7 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + addl $126, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %edx + shll $7, %ecx + andl $-32641, %edx + orl %ecx, %edx + movss -16(%rsp), %xmm0 + movw %dx, -10(%rsp) + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_71(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edi + movl %edi, (%rsi) + ret + +.LBL_2_8: + addl $-69, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %eax + shll $7, %ecx + andl $-32641, %eax + orl %ecx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + movl $4, %eax + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_70(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edx + movl %edx, (%rsi) + ret + +.LBL_2_9: + movss .FLT_69(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_10: + movss .FLT_57(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_12: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_14: + testl $8388607, (%rdi) + jne .LBL_2_13 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp10_cout_rare_internal,@function + .size __svml_sexp10_cout_rare_internal,.-__svml_sexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp10_data_internal_avx512: + .long 1065353217 + .long 1065358897 + .long 1065364581 + .long 1065370269 + .long 1065375961 + .long 1065381656 + .long 1065387356 + .long 1065393059 + .long 1065398766 + .long 1065404477 + .long 1065410192 + .long 1065415911 + .long 1065421634 + .long 1065427360 + .long 1065433091 + .long 1065438825 + .long 1065444563 + .long 1065450305 + .long 1065456051 + .long 1065461801 + .long 1065467554 + .long 1065473312 + .long 1065479074 + .long 1065484839 + .long 1065490608 + .long 1065496381 + .long 1065502159 + .long 1065507940 + .long 1065513725 + .long 1065519513 + .long 1065525306 + .long 1065531103 + .long 1065353216 + .long 1065536903 + .long 1065724611 + .long 1065916431 + .long 1066112450 + .long 1066312762 + .long 1066517459 + .long 1066726640 + .long 1066940400 + .long 1067158842 + .long 1067382066 + .long 1067610179 + .long 1067843287 + .long 1068081499 + .long 1068324927 + .long 1068573686 + .long 1068827891 + .long 1069087663 + .long 1069353124 + .long 1069624397 + .long 1069901610 + .long 1070184894 + .long 1070474380 + .long 1070770206 + .long 1071072509 + .long 1071381432 + .long 1071697119 + .long 1072019719 + .long 1072349383 + .long 1072686266 + .long 1073030525 + .long 1073382323 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1178599424 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 1050288283 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 2994079840 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 796917760 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 3968 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 1079286392 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 865573431 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 3221225471 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1076475866 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .long 1075010957 + .type __svml_sexp10_data_internal_avx512,@object + .size __svml_sexp10_data_internal_avx512,1280 + .align 4 + +.FLT_55: + .long 0x421a209b + .type .FLT_55,@object + .size .FLT_55,4 + .align 4 + +.FLT_56: + .long 0xc2349e35 + .type .FLT_56,@object + .size .FLT_56,4 + .align 4 + +.FLT_57: + .long 0x7f7fffff + .type .FLT_57,@object + .size .FLT_57,4 + .align 4 + +.FLT_58: + .long 0x40549a78 + .type .FLT_58,@object + .size .FLT_58,4 + .align 4 + +.FLT_59: + .long 0x4b400000 + .type .FLT_59,@object + .size .FLT_59,4 + .align 4 + +.FLT_60: + .long 0x3e9a2000 + .type .FLT_60,@object + .size .FLT_60,4 + .align 4 + +.FLT_61: + .long 0x369a84fc + .type .FLT_61,@object + .size .FLT_61,4 + .align 4 + +.FLT_62: + .long 0x40135d8e + .type .FLT_62,@object + .size .FLT_62,4 + .align 4 + +.FLT_63: + .long 0x4029a927 + .type .FLT_63,@object + .size .FLT_63,4 + .align 4 + +.FLT_64: + .long 0x400237ac + .type .FLT_64,@object + .size .FLT_64,4 + .align 4 + +.FLT_65: + .long 0x3f95eb27 + .type .FLT_65,@object + .size .FLT_65,4 + .align 4 + +.FLT_66: + .long 0x3f0ac64a + .type .FLT_66,@object + .size .FLT_66,4 + .align 4 + +.FLT_67: + .long 0x3e54fd23 + .type .FLT_67,@object + .size .FLT_67,4 + .align 4 + +.FLT_68: + .long 0xc217b819 + .type .FLT_68,@object + .size .FLT_68,4 + .align 4 + +.FLT_69: + .long 0x00000001 + .type .FLT_69,@object + .size .FLT_69,4 + .align 4 + +.FLT_70: + .long 0x21800000 + .type .FLT_70,@object + .size .FLT_70,4 + .align 4 + +.FLT_71: + .long 0x40000000 + .type .FLT_71,@object + .size .FLT_71,4 + .align 4 + +.FLT_72: + .long 0x3f800000 + .type .FLT_72,@object + .size .FLT_72,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S new file mode 100644 index 0000000000..460d01357d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized exp10f, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4v_exp10f _ZGVbN4v_exp10f_sse2 +#include "../svml_s_exp10f4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c new file mode 100644 index 0000000000..7ce90a9bae --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp10f, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4v_exp10f +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_exp10f, __GI__ZGVbN4v_exp10f, + __redirect__ZGVbN4v_exp10f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S new file mode 100644 index 0000000000..152bf2865d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S @@ -0,0 +1,1038 @@ +/* Function expf4 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp10(x) = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y)) + * where + * x = m*log10(2)/K + y, y in [-log10(2)/K..log10(2)/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp10(x)-1 + * on small interval [-log10(2)/K..log10(2)/K] + * + * Special cases: + * + * exp10(NaN) = NaN + * exp10(+INF) = +INF + * exp10(-INF) = 0 + * exp10(x) = 1 for subnormals + * For IEEE float + * if x > 38.5318412780761720 then exp10f(x) overflow + * if x < -45.4555282592773440 then exp10f(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVbN4v_exp10f_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm4 + +/* Load arument */ + movups 128+__svml_sexp10_data_internal(%rip), %xmm6 + lea __svml_sexp10_data_internal(%rip), %rax + mulps %xmm4, %xmm6 + movups 192+__svml_sexp10_data_internal(%rip), %xmm0 + movdqu 576+__svml_sexp10_data_internal(%rip), %xmm5 + addps %xmm0, %xmm6 + +/* Index and lookup */ + movdqa %xmm5, %xmm3 + movaps %xmm6, %xmm7 + pand %xmm6, %xmm3 + +/* remove index bits */ + pandn %xmm6, %xmm5 + +/* iIndex *= sizeof(S); */ + pslld $2, %xmm3 + +/* 2^N */ + pslld $18, %xmm5 + pshufd $1, %xmm3, %xmm2 + subps %xmm0, %xmm7 + movd %xmm3, %ecx + movd %xmm2, %r8d + pshufd $2, %xmm3, %xmm2 + pshufd $3, %xmm3, %xmm3 + movd %xmm2, %r9d + movd %xmm3, %r10d + +/* Check for overflow\underflow */ + movdqu 640+__svml_sexp10_data_internal(%rip), %xmm1 + pand %xmm4, %xmm1 + movslq %ecx, %rcx + movslq %r8d, %r8 + movslq %r9d, %r9 + movslq %r10d, %r10 + pcmpgtd 704+__svml_sexp10_data_internal(%rip), %xmm1 + movmskps %xmm1, %edx + movd (%rax,%rcx), %xmm0 + movd (%rax,%r8), %xmm1 + punpckldq %xmm1, %xmm0 + movd (%rax,%r9), %xmm2 + movd (%rax,%r10), %xmm1 + punpckldq %xmm1, %xmm2 + +/* R */ + movaps %xmm4, %xmm1 + punpcklqdq %xmm2, %xmm0 + movups 256+__svml_sexp10_data_internal(%rip), %xmm2 + mulps %xmm7, %xmm2 + movups 320+__svml_sexp10_data_internal(%rip), %xmm3 + subps %xmm2, %xmm1 + mulps %xmm7, %xmm3 + +/* + * Polynomial + * exp10 = 2^N*(Tj+Tj*poly) + * poly(sN) = {1+later} a0+a1*sR + */ + movups 512+__svml_sexp10_data_internal(%rip), %xmm7 + subps %xmm3, %xmm1 + mulps %xmm1, %xmm7 + addps 448+__svml_sexp10_data_internal(%rip), %xmm7 + mulps %xmm7, %xmm1 + addps 384+__svml_sexp10_data_internal(%rip), %xmm1 + +/* Tj_l+Tj_h*poly */ + mulps %xmm0, %xmm1 + addps %xmm1, %xmm0 + +/* quick mul 2^N */ + paddd %xmm5, %xmm0 + +/* Finish */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm4, 192(%rsp) + movups %xmm0, 256(%rsp) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,4), %rdi + lea 256(%rsp,%r12,4), %rsi + call __svml_sexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN4v_exp10f_sse4) + + .align 16,0x90 + +__svml_sexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + movss (%rdi), %xmm8 + movss .FLT_57(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm8, %xmm0 + jb .LBL_2_10 + comiss .FLT_58(%rip), %xmm8 + jb .LBL_2_9 + movss .FLT_60(%rip), %xmm0 + movaps %xmm8, %xmm6 + mulss %xmm8, %xmm0 + comiss .FLT_70(%rip), %xmm8 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_61(%rip), %xmm2 + movss .FLT_69(%rip), %xmm7 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %ecx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + mulss .FLT_62(%rip), %xmm4 + movss -24(%rsp), %xmm5 + subss %xmm4, %xmm6 + mulss .FLT_63(%rip), %xmm5 + subss %xmm5, %xmm6 + mulss %xmm6, %xmm7 + addss .FLT_68(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_67(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_66(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_65(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_64(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_74(%rip), %xmm7 + movss %xmm7, -16(%rsp) + jb .LBL_2_8 + lea 127(%rcx), %edx + cmpl $254, %edx + ja .LBL_2_7 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + addl $126, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %edx + shll $7, %ecx + andl $-32641, %edx + orl %ecx, %edx + movss -16(%rsp), %xmm0 + movw %dx, -10(%rsp) + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_73(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edi + movl %edi, (%rsi) + ret + +.LBL_2_8: + addl $-69, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %eax + shll $7, %ecx + andl $-32641, %eax + orl %ecx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + movl $4, %eax + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_72(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edx + movl %edx, (%rsi) + ret + +.LBL_2_9: + movss .FLT_71(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_10: + movss .FLT_59(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_12: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_14: + testl $8388607, (%rdi) + jne .LBL_2_13 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp10_cout_rare_internal,@function + .size __svml_sexp10_cout_rare_internal,.-__svml_sexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp10_data_internal: + .long 1065353216 + .long 1065536903 + .long 1065724611 + .long 1065916431 + .long 1066112450 + .long 1066312762 + .long 1066517459 + .long 1066726640 + .long 1066940400 + .long 1067158842 + .long 1067382066 + .long 1067610179 + .long 1067843287 + .long 1068081499 + .long 1068324927 + .long 1068573686 + .long 1068827891 + .long 1069087663 + .long 1069353124 + .long 1069624397 + .long 1069901610 + .long 1070184894 + .long 1070474380 + .long 1070770206 + .long 1071072509 + .long 1071381432 + .long 1071697119 + .long 1072019719 + .long 1072349383 + .long 1072686266 + .long 1073030525 + .long 1073382323 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_sexp10_data_internal,@object + .size __svml_sexp10_data_internal,768 + .align 4 + +.FLT_57: + .long 0x421a209b + .type .FLT_57,@object + .size .FLT_57,4 + .align 4 + +.FLT_58: + .long 0xc2349e35 + .type .FLT_58,@object + .size .FLT_58,4 + .align 4 + +.FLT_59: + .long 0x7f7fffff + .type .FLT_59,@object + .size .FLT_59,4 + .align 4 + +.FLT_60: + .long 0x40549a78 + .type .FLT_60,@object + .size .FLT_60,4 + .align 4 + +.FLT_61: + .long 0x4b400000 + .type .FLT_61,@object + .size .FLT_61,4 + .align 4 + +.FLT_62: + .long 0x3e9a2000 + .type .FLT_62,@object + .size .FLT_62,4 + .align 4 + +.FLT_63: + .long 0x369a84fc + .type .FLT_63,@object + .size .FLT_63,4 + .align 4 + +.FLT_64: + .long 0x40135d8e + .type .FLT_64,@object + .size .FLT_64,4 + .align 4 + +.FLT_65: + .long 0x4029a927 + .type .FLT_65,@object + .size .FLT_65,4 + .align 4 + +.FLT_66: + .long 0x400237ac + .type .FLT_66,@object + .size .FLT_66,4 + .align 4 + +.FLT_67: + .long 0x3f95eb27 + .type .FLT_67,@object + .size .FLT_67,4 + .align 4 + +.FLT_68: + .long 0x3f0ac64a + .type .FLT_68,@object + .size .FLT_68,4 + .align 4 + +.FLT_69: + .long 0x3e54fd23 + .type .FLT_69,@object + .size .FLT_69,4 + .align 4 + +.FLT_70: + .long 0xc217b819 + .type .FLT_70,@object + .size .FLT_70,4 + .align 4 + +.FLT_71: + .long 0x00000001 + .type .FLT_71,@object + .size .FLT_71,4 + .align 4 + +.FLT_72: + .long 0x21800000 + .type .FLT_72,@object + .size .FLT_72,4 + .align 4 + +.FLT_73: + .long 0x40000000 + .type .FLT_73,@object + .size .FLT_73,4 + .align 4 + +.FLT_74: + .long 0x3f800000 + .type .FLT_74,@object + .size .FLT_74,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S new file mode 100644 index 0000000000..3f3fe252da --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized exp10f, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8v_exp10f _ZGVdN8v_exp10f_sse_wrapper +#include "../svml_s_exp10f8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c new file mode 100644 index 0000000000..1f5ed5a59d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp10f, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8v_exp10f +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_exp10f, __GI__ZGVdN8v_exp10f, + __redirect__ZGVdN8v_exp10f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S new file mode 100644 index 0000000000..9bc9822c3d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S @@ -0,0 +1,923 @@ +/* Function expf8 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp10(x) = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y)) + * where + * x = m*log10(2)/K + y, y in [-log10(2)/K..log10(2)/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp10(x)-1 + * on small interval [-log10(2)/K..log10(2)/K] + * + * Special cases: + * + * exp10(NaN) = NaN + * exp10(+INF) = +INF + * exp10(-INF) = 0 + * exp10(x) = 1 for subnormals + * For IEEE float + * if x > 38.5318412780761720 then exp10f(x) overflow + * if x < -45.4555282592773440 then exp10f(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVdN8v_exp10f_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + lea __svml_sexp10_data_internal(%rip), %rax + vmovups 192+__svml_sexp10_data_internal(%rip), %ymm4 + +/* Load arument */ + vmovups 128+__svml_sexp10_data_internal(%rip), %ymm1 + vmovups 576+__svml_sexp10_data_internal(%rip), %ymm2 + vmovups %ymm9, 96(%rsp) + vmovups %ymm8, 32(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm15, 320(%rsp) + vmovups %ymm13, 256(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm11, 192(%rsp) + vmovaps %ymm0, %ymm3 + vfmadd213ps %ymm4, %ymm3, %ymm1 + +/* Check for overflow\underflow */ + vandps 640+__svml_sexp10_data_internal(%rip), %ymm3, %ymm5 + vpcmpgtd 704+__svml_sexp10_data_internal(%rip), %ymm5, %ymm6 + +/* Index and lookup */ + vandps %ymm2, %ymm1, %ymm7 + vmovmskps %ymm6, %edx + +/* iIndex *= sizeof(S); */ + vpslld $2, %ymm7, %ymm6 + vsubps %ymm4, %ymm1, %ymm0 + vmovd %xmm6, %ecx + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vextractf128 $1, %ymm6, %xmm14 + vpextrd $1, %xmm6, %r8d + movslq %ecx, %rcx + movslq %r8d, %r8 + vpextrd $2, %xmm6, %r9d + vpextrd $3, %xmm6, %r10d + vmovd (%rax,%rcx), %xmm4 + vmovd (%rax,%r8), %xmm5 + vpunpckldq %xmm5, %xmm4, %xmm9 + +/* R */ + vmovups 256+__svml_sexp10_data_internal(%rip), %ymm5 + vmovd %xmm14, %r11d + vfnmadd213ps %ymm3, %ymm0, %ymm5 + movslq %r9d, %r9 + movslq %r10d, %r10 + vfnmadd132ps 320+__svml_sexp10_data_internal(%rip), %ymm5, %ymm0 + vmovd (%rax,%r9), %xmm7 + vmovd (%rax,%r10), %xmm8 + vpunpckldq %xmm8, %xmm7, %xmm10 + +/* + * Polynomial + * exp10 = 2^N*(Tj+Tj*poly) + * poly(sN) = {1+later} a0+a1*sR + */ + vmovups 512+__svml_sexp10_data_internal(%rip), %ymm8 + vpunpcklqdq %xmm10, %xmm9, %xmm11 + vfmadd213ps 448+__svml_sexp10_data_internal(%rip), %ymm0, %ymm8 + vpextrd $1, %xmm14, %ecx + vpextrd $2, %xmm14, %r8d + vpextrd $3, %xmm14, %r9d + movslq %r11d, %r11 + movslq %ecx, %rcx + movslq %r8d, %r8 + movslq %r9d, %r9 + vfmadd213ps 384+__svml_sexp10_data_internal(%rip), %ymm0, %ymm8 + vmovd (%rax,%r11), %xmm12 + vmovd (%rax,%rcx), %xmm13 + vmovd (%rax,%r8), %xmm15 + vmovd (%rax,%r9), %xmm14 + vpunpckldq %xmm13, %xmm12, %xmm6 + vpunpckldq %xmm14, %xmm15, %xmm12 + +/* remove index bits */ + vpandn %ymm1, %ymm2, %ymm0 + vpunpcklqdq %xmm12, %xmm6, %xmm4 + +/* 2^N */ + vpslld $18, %ymm0, %ymm1 + vinsertf128 $1, %xmm4, %ymm11, %ymm7 + +/* Tj_l+Tj_h*poly */ + vfmadd213ps %ymm7, %ymm7, %ymm8 + +/* quick mul 2^N */ + vpaddd %ymm1, %ymm8, %ymm0 + +/* Finish */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovups %ymm3, 64(%rsp) + vmovups %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovups 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_sexp10_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN8v_exp10f_avx2) + + .align 16,0x90 + +__svml_sexp10_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_12 + movss (%rdi), %xmm8 + movss .FLT_57(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm8, %xmm0 + jb .LBL_2_10 + comiss .FLT_58(%rip), %xmm8 + jb .LBL_2_9 + movss .FLT_60(%rip), %xmm0 + movaps %xmm8, %xmm6 + mulss %xmm8, %xmm0 + comiss .FLT_70(%rip), %xmm8 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_61(%rip), %xmm2 + movss .FLT_69(%rip), %xmm7 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %ecx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + mulss .FLT_62(%rip), %xmm4 + movss -24(%rsp), %xmm5 + subss %xmm4, %xmm6 + mulss .FLT_63(%rip), %xmm5 + subss %xmm5, %xmm6 + mulss %xmm6, %xmm7 + addss .FLT_68(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_67(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_66(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_65(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_64(%rip), %xmm7 + mulss %xmm6, %xmm7 + addss .FLT_74(%rip), %xmm7 + movss %xmm7, -16(%rsp) + jb .LBL_2_8 + lea 127(%rcx), %edx + cmpl $254, %edx + ja .LBL_2_7 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + addl $126, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %edx + shll $7, %ecx + andl $-32641, %edx + orl %ecx, %edx + movss -16(%rsp), %xmm0 + movw %dx, -10(%rsp) + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_73(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edi + movl %edi, (%rsi) + ret + +.LBL_2_8: + addl $-69, %ecx + movzbl %cl, %ecx + movzwl -10(%rsp), %eax + shll $7, %ecx + andl $-32641, %eax + orl %ecx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + movl $4, %eax + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_72(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %edx + movl %edx, (%rsi) + ret + +.LBL_2_9: + movss .FLT_71(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_10: + movss .FLT_59(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_12: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_14: + testl $8388607, (%rdi) + jne .LBL_2_13 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp10_cout_rare_internal,@function + .size __svml_sexp10_cout_rare_internal,.-__svml_sexp10_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp10_data_internal: + .long 1065353216 + .long 1065536903 + .long 1065724611 + .long 1065916431 + .long 1066112450 + .long 1066312762 + .long 1066517459 + .long 1066726640 + .long 1066940400 + .long 1067158842 + .long 1067382066 + .long 1067610179 + .long 1067843287 + .long 1068081499 + .long 1068324927 + .long 1068573686 + .long 1068827891 + .long 1069087663 + .long 1069353124 + .long 1069624397 + .long 1069901610 + .long 1070184894 + .long 1070474380 + .long 1070770206 + .long 1071072509 + .long 1071381432 + .long 1071697119 + .long 1072019719 + .long 1072349383 + .long 1072686266 + .long 1073030525 + .long 1073382323 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .long 1121229432 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .long 1008345088 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .long 874153212 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .long 804046952 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .long 1075011099 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .long 1076472018 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .long 31 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .long 1108850712 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_sexp10_data_internal,@object + .size __svml_sexp10_data_internal,768 + .align 4 + +.FLT_57: + .long 0x421a209b + .type .FLT_57,@object + .size .FLT_57,4 + .align 4 + +.FLT_58: + .long 0xc2349e35 + .type .FLT_58,@object + .size .FLT_58,4 + .align 4 + +.FLT_59: + .long 0x7f7fffff + .type .FLT_59,@object + .size .FLT_59,4 + .align 4 + +.FLT_60: + .long 0x40549a78 + .type .FLT_60,@object + .size .FLT_60,4 + .align 4 + +.FLT_61: + .long 0x4b400000 + .type .FLT_61,@object + .size .FLT_61,4 + .align 4 + +.FLT_62: + .long 0x3e9a2000 + .type .FLT_62,@object + .size .FLT_62,4 + .align 4 + +.FLT_63: + .long 0x369a84fc + .type .FLT_63,@object + .size .FLT_63,4 + .align 4 + +.FLT_64: + .long 0x40135d8e + .type .FLT_64,@object + .size .FLT_64,4 + .align 4 + +.FLT_65: + .long 0x4029a927 + .type .FLT_65,@object + .size .FLT_65,4 + .align 4 + +.FLT_66: + .long 0x400237ac + .type .FLT_66,@object + .size .FLT_66,4 + .align 4 + +.FLT_67: + .long 0x3f95eb27 + .type .FLT_67,@object + .size .FLT_67,4 + .align 4 + +.FLT_68: + .long 0x3f0ac64a + .type .FLT_68,@object + .size .FLT_68,4 + .align 4 + +.FLT_69: + .long 0x3e54fd23 + .type .FLT_69,@object + .size .FLT_69,4 + .align 4 + +.FLT_70: + .long 0xc217b819 + .type .FLT_70,@object + .size .FLT_70,4 + .align 4 + +.FLT_71: + .long 0x00000001 + .type .FLT_71,@object + .size .FLT_71,4 + .align 4 + +.FLT_72: + .long 0x21800000 + .type .FLT_72,@object + .size .FLT_72,4 + .align 4 + +.FLT_73: + .long 0x40000000 + .type .FLT_73,@object + .size .FLT_73,4 + .align 4 + +.FLT_74: + .long 0x3f800000 + .type .FLT_74,@object + .size .FLT_74,4 diff --git a/sysdeps/x86_64/fpu/svml_d_exp102_core.S b/sysdeps/x86_64/fpu/svml_d_exp102_core.S new file mode 100644 index 0000000000..157fb3b7c0 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp102_core.S @@ -0,0 +1,29 @@ +/* Function exp10 vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_exp10) +WRAPPER_IMPL_SSE2 exp10 +END (_ZGVbN2v_exp10) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2v_exp10) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core.S b/sysdeps/x86_64/fpu/svml_d_exp104_core.S new file mode 100644 index 0000000000..9b9d0a5d4b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp104_core.S @@ -0,0 +1,29 @@ +/* Function exp10 vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_exp10) +WRAPPER_IMPL_AVX _ZGVbN2v_exp10 +END (_ZGVdN4v_exp10) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4v_exp10) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S new file mode 100644 index 0000000000..1ba1a819ed --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S @@ -0,0 +1,25 @@ +/* Function exp10 vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_exp10) +WRAPPER_IMPL_AVX _ZGVbN2v_exp10 +END (_ZGVcN4v_exp10) diff --git a/sysdeps/x86_64/fpu/svml_d_exp108_core.S b/sysdeps/x86_64/fpu/svml_d_exp108_core.S new file mode 100644 index 0000000000..a530dc12de --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp108_core.S @@ -0,0 +1,25 @@ +/* Function exp10 vectorized with AVX-512, wrapper to AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_exp10) +WRAPPER_IMPL_AVX512 _ZGVdN4v_exp10 +END (_ZGVeN8v_exp10) diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S new file mode 100644 index 0000000000..e5043bc875 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S @@ -0,0 +1,25 @@ +/* Function exp10f vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_exp10f) +WRAPPER_IMPL_AVX512 _ZGVdN8v_exp10f +END (_ZGVeN16v_exp10f) diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S new file mode 100644 index 0000000000..75e6637a82 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S @@ -0,0 +1,29 @@ +/* Function exp10f vectorized with SSE2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_exp10f) +WRAPPER_IMPL_SSE2 exp10f +END (_ZGVbN4v_exp10f) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4v_exp10f) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S new file mode 100644 index 0000000000..d481d2dee9 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S @@ -0,0 +1,29 @@ +/* Function exp10f vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_exp10f) +WRAPPER_IMPL_AVX _ZGVbN4v_exp10f +END (_ZGVdN8v_exp10f) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8v_exp10f) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S new file mode 100644 index 0000000000..65944bd4d2 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S @@ -0,0 +1,25 @@ +/* Function exp10f vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVcN8v_exp10f) +WRAPPER_IMPL_AVX _ZGVbN4v_exp10f +END (_ZGVcN8v_exp10f) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c new file mode 100644 index 0000000000..7cdda9895b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp10.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c new file mode 100644 index 0000000000..7cdda9895b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp10.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c new file mode 100644 index 0000000000..7cdda9895b --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp10.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c new file mode 100644 index 0000000000..b1461ed85e --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC exp10 +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index fdacdd1728..90eb5c3205 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt) VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVbN2v_erfc) +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index c2c12533f2..5a960d003e 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt) VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVdN4v_erfc) +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index acecf23719..37d9ce9cb7 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt) VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVcN4v_erfc) +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index c7ef3e8356..d61ab42881 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt) VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVeN8v_erfc) +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c new file mode 100644 index 0000000000..be3cdaa80d --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp10f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c new file mode 100644 index 0000000000..be3cdaa80d --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp10f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c new file mode 100644 index 0000000000..be3cdaa80d --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp10f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c new file mode 100644 index 0000000000..06f447eb8d --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC exp10f +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index 379206fe0e..1470b4d890 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf) VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVeN16v_erfcf) +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index aa756f97f5..7833482006 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf) VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVbN4v_erfcf) +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index 68eec1f1ed..6719c1ae56 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf) VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVdN8v_erfcf) +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index a4759d6156..40b852aad6 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf) VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVcN8v_erfcf) +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f) #define VEC_INT_TYPE __m128i