From patchwork Wed Nov 24 19:37:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 48102 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8FC503857C66 for ; Wed, 24 Nov 2021 20:24:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8FC503857C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637785482; bh=Eo2YMZvxwoEs5yWS5Cn2OaDJ+R8BYjit1Hq9/i/P3Zc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Wrz71zLfwGtZlpJOmLe4L+9OuqAQ17V/nPgcGr2kEKPhgMsTi/AE/+ATdTrtPSsHp M9yQm5e867jL+GumLyrU+njGYqSZJ+kkFI73jTIqZZl9VyKgG5VdK/9F9/E9w4MICD 8vj7YQGDbmtJ7mqtAZSLBAq8Cmqd0XJpanWXe0yA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id 2B37D385800C for ; Wed, 24 Nov 2021 19:38:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2B37D385800C X-IronPort-AV: E=McAfee;i="6200,9189,10178"; a="222581830" X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="222581830" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2021 11:38:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="510002848" Received: from scymds02.sc.intel.com ([10.82.73.244]) by orsmga008.jf.intel.com with ESMTP; 24 Nov 2021 11:38:12 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1AOJc7XC021555; Wed, 24 Nov 2021 11:38:12 -0800 To: libc-alpha@sourceware.org Subject: [PATCH 25/42] x86-64: Add vector exp2/exp2f implementation to libmvec Date: Wed, 24 Nov 2021 11:37:50 -0800 Message-Id: <20211124193807.2093208-26-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211124193807.2093208-1-skpgkp2@gmail.com> References: <20211124193807.2093208-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, LOTS_OF_MONEY, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_exp22_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp22_core.c | 27 + .../fpu/multiarch/svml_d_exp22_core_sse4.S | 1327 +++++++++++++++++ .../fpu/multiarch/svml_d_exp24_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp24_core.c | 27 + .../fpu/multiarch/svml_d_exp24_core_avx2.S | 1231 +++++++++++++++ .../fpu/multiarch/svml_d_exp28_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_exp28_core.c | 27 + .../fpu/multiarch/svml_d_exp28_core_avx512.S | 829 ++++++++++ .../fpu/multiarch/svml_s_exp2f16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_exp2f16_core.c | 28 + .../multiarch/svml_s_exp2f16_core_avx512.S | 511 +++++++ .../fpu/multiarch/svml_s_exp2f4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_exp2f4_core.c | 28 + .../fpu/multiarch/svml_s_exp2f4_core_sse4.S | 923 ++++++++++++ .../fpu/multiarch/svml_s_exp2f8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_exp2f8_core.c | 28 + .../fpu/multiarch/svml_s_exp2f8_core_avx2.S | 795 ++++++++++ sysdeps/x86_64/fpu/svml_d_exp22_core.S | 29 + sysdeps/x86_64/fpu/svml_d_exp24_core.S | 29 + sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_exp28_core.S | 25 + sysdeps/x86_64/fpu/svml_s_exp2f16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_exp2f4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_exp2f8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-exp2-avx.c | 1 + .../fpu/test-double-libmvec-exp2-avx2.c | 1 + .../fpu/test-double-libmvec-exp2-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-exp2.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-exp2f-avx.c | 1 + .../fpu/test-float-libmvec-exp2f-avx2.c | 1 + .../fpu/test-float-libmvec-exp2f-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 49 files changed, 6184 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp22_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_exp28_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index ff85f5be62..e40dbe2604 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -230,4 +230,15 @@ #define __DECL_SIMD_exp10f32x #define __DECL_SIMD_exp10f64x #define __DECL_SIMD_exp10f128x + +#define __DECL_SIMD_exp2 +#define __DECL_SIMD_exp2f +#define __DECL_SIMD_exp2l +#define __DECL_SIMD_exp2f16 +#define __DECL_SIMD_exp2f32 +#define __DECL_SIMD_exp2f64 +#define __DECL_SIMD_exp2f128 +#define __DECL_SIMD_exp2f32x +#define __DECL_SIMD_exp2f64x +#define __DECL_SIMD_exp2f128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 6fc9d6f156..19873c4d07 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -127,7 +127,7 @@ __MATHCALL (logb,, (_Mdouble_ __x)); #ifdef __USE_ISOC99 /* Compute base-2 exponential of X. */ -__MATHCALL (exp2,, (_Mdouble_ __x)); +__MATHCALL_VEC (exp2,, (_Mdouble_ __x)); /* Compute base-2 logarithm of X. */ __MATHCALL (log2,, (_Mdouble_ __x)); diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index be309ddbf0..85c2ebe3f3 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -57,6 +57,7 @@ GLIBC_2.35 _ZGVbN2v_cosh F GLIBC_2.35 _ZGVbN2v_erf F GLIBC_2.35 _ZGVbN2v_erfc F GLIBC_2.35 _ZGVbN2v_exp10 F +GLIBC_2.35 _ZGVbN2v_exp2 F GLIBC_2.35 _ZGVbN2vv_atan2 F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F @@ -69,6 +70,7 @@ GLIBC_2.35 _ZGVbN4v_coshf F GLIBC_2.35 _ZGVbN4v_erfcf F GLIBC_2.35 _ZGVbN4v_erff F GLIBC_2.35 _ZGVbN4v_exp10f F +GLIBC_2.35 _ZGVbN4v_exp2f F GLIBC_2.35 _ZGVbN4vv_atan2f F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F @@ -81,6 +83,7 @@ GLIBC_2.35 _ZGVcN4v_cosh F GLIBC_2.35 _ZGVcN4v_erf F GLIBC_2.35 _ZGVcN4v_erfc F GLIBC_2.35 _ZGVcN4v_exp10 F +GLIBC_2.35 _ZGVcN4v_exp2 F GLIBC_2.35 _ZGVcN4vv_atan2 F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F @@ -93,6 +96,7 @@ GLIBC_2.35 _ZGVcN8v_coshf F GLIBC_2.35 _ZGVcN8v_erfcf F GLIBC_2.35 _ZGVcN8v_erff F GLIBC_2.35 _ZGVcN8v_exp10f F +GLIBC_2.35 _ZGVcN8v_exp2f F GLIBC_2.35 _ZGVcN8vv_atan2f F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F @@ -105,6 +109,7 @@ GLIBC_2.35 _ZGVdN4v_cosh F GLIBC_2.35 _ZGVdN4v_erf F GLIBC_2.35 _ZGVdN4v_erfc F GLIBC_2.35 _ZGVdN4v_exp10 F +GLIBC_2.35 _ZGVdN4v_exp2 F GLIBC_2.35 _ZGVdN4vv_atan2 F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F @@ -117,6 +122,7 @@ GLIBC_2.35 _ZGVdN8v_coshf F GLIBC_2.35 _ZGVdN8v_erfcf F GLIBC_2.35 _ZGVdN8v_erff F GLIBC_2.35 _ZGVdN8v_exp10f F +GLIBC_2.35 _ZGVdN8v_exp2f F GLIBC_2.35 _ZGVdN8vv_atan2f F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F @@ -129,6 +135,7 @@ GLIBC_2.35 _ZGVeN16v_coshf F GLIBC_2.35 _ZGVeN16v_erfcf F GLIBC_2.35 _ZGVeN16v_erff F GLIBC_2.35 _ZGVeN16v_exp10f F +GLIBC_2.35 _ZGVeN16v_exp2f F GLIBC_2.35 _ZGVeN16vv_atan2f F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F @@ -141,4 +148,5 @@ GLIBC_2.35 _ZGVeN8v_cosh F GLIBC_2.35 _ZGVeN8v_erf F GLIBC_2.35 _ZGVeN8v_erfc F GLIBC_2.35 _ZGVeN8v_exp10 F +GLIBC_2.35 _ZGVeN8v_exp2 F GLIBC_2.35 _ZGVeN8vv_atan2 F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index 69d05d334d..fd1fde7a7a 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -106,6 +106,10 @@ # define __DECL_SIMD_exp10 __DECL_SIMD_x86_64 # undef __DECL_SIMD_exp10f # define __DECL_SIMD_exp10f __DECL_SIMD_x86_64 +# undef __DECL_SIMD_exp2 +# define __DECL_SIMD_exp2 __DECL_SIMD_x86_64 +# undef __DECL_SIMD_exp2f +# define __DECL_SIMD_exp2f __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index fe982c21a1..bcf2d5a0e9 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -36,6 +36,7 @@ libmvec-funcs = \ erfc \ exp \ exp10 \ + exp2 \ log \ pow \ sin \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index f891c07550..055ad29b2c 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -25,6 +25,7 @@ libmvec { _ZGVbN2v_erf; _ZGVcN4v_erf; _ZGVdN4v_erf; _ZGVeN8v_erf; _ZGVbN2v_erfc; _ZGVcN4v_erfc; _ZGVdN4v_erfc; _ZGVeN8v_erfc; _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10; + _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2; _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; @@ -37,6 +38,7 @@ libmvec { _ZGVbN4v_erfcf; _ZGVcN8v_erfcf; _ZGVdN8v_erfcf; _ZGVeN16v_erfcf; _ZGVbN4v_erff; _ZGVcN8v_erff; _ZGVdN8v_erff; _ZGVeN16v_erff; _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f; + _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f; _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index 4465c0af19..0a0a2ab4fc 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -1453,6 +1453,26 @@ float: 1 float128: 2 ldouble: 1 +Function: "exp2_vlen16": +float: 2 + +Function: "exp2_vlen2": +double: 1 + +Function: "exp2_vlen4": +double: 1 +float: 2 + +Function: "exp2_vlen4_avx2": +double: 1 + +Function: "exp2_vlen8": +double: 1 +float: 2 + +Function: "exp2_vlen8_avx2": +float: 2 + Function: "exp_downward": double: 1 float: 1 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S new file mode 100644 index 0000000000..330260baaa --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized exp2, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2v_exp2 _ZGVbN2v_exp2_sse2 +#include "../svml_d_exp22_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c new file mode 100644 index 0000000000..e0cf198030 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp2, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2v_exp2 +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_exp2, __GI__ZGVbN2v_exp2, __redirect__ZGVbN2v_exp2) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S new file mode 100644 index 0000000000..0d820b07fe --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S @@ -0,0 +1,1327 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp2(x) = 2^n * T[j] * (1 + P(y)) + * where + * x = m*(1/K) + y, y in [-1/K..1/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp2(x)-1 + * on small interval [-1/K..1/K] + * + * Special cases: + * + * exp2(NaN) = NaN + * exp2(+INF) = +INF + * exp2(-INF) = 0 + * exp2(x) = 1 for subnormals + * For IEEE double + * if x >= 1024.0 then exp2(x) overflows + * if x < -1076.0 then exp2(x) underflows + * + */ + +#include + + .text +ENTRY(_ZGVbN2v_exp2_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm3 + movups 1024+__svml_dexp2_data_internal(%rip), %xmm5 + +/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */ + lea __svml_dexp2_data_internal(%rip), %r8 + +/* Load arument */ + movaps %xmm5, %xmm2 + addpd %xmm3, %xmm2 + movaps %xmm2, %xmm4 + subpd %xmm5, %xmm4 + movq 1408+__svml_dexp2_data_internal(%rip), %xmm6 + +/* Check for overflow\underflow */ + pshufd $221, %xmm3, %xmm0 + pand %xmm6, %xmm0 + +/* R */ + movaps %xmm3, %xmm6 + subpd %xmm4, %xmm6 + +/* + * Polynomial + * poly(dN) = a1*dR+...+a4*dR^4 + */ + movups 1280+__svml_dexp2_data_internal(%rip), %xmm4 + mulpd %xmm6, %xmm4 + addpd 1216+__svml_dexp2_data_internal(%rip), %xmm4 + mulpd %xmm6, %xmm4 + addpd 1152+__svml_dexp2_data_internal(%rip), %xmm4 + movdqu 1344+__svml_dexp2_data_internal(%rip), %xmm1 + +/* Index and lookup */ + movdqa %xmm1, %xmm5 + pandn %xmm2, %xmm1 + pand %xmm2, %xmm5 + +/* 2^N */ + psllq $45, %xmm1 + movd %xmm5, %eax + movq 1472+__svml_dexp2_data_internal(%rip), %xmm7 + pcmpgtd %xmm7, %xmm0 + pextrw $4, %xmm5, %ecx + +/* a1+...+a4*dR^3 ! */ + mulpd %xmm6, %xmm4 + movmskps %xmm0, %edx + addpd 1088+__svml_dexp2_data_internal(%rip), %xmm4 + shll $3, %eax + shll $3, %ecx + movq (%r8,%rax), %xmm0 + movhpd (%r8,%rcx), %xmm0 + +/* dR=dR*dT */ + mulpd %xmm0, %xmm6 + +/* + * Reconstruction + * exp2 = {2^N later}*(Tj+Tj*poly) + * dN = dT+dT*dR*(a1+...+a4*dR^3) + */ + mulpd %xmm6, %xmm4 + addpd %xmm4, %xmm0 + +/* quick 2^N */ + paddq %xmm1, %xmm0 + andl $3, %edx + +/* Finish */ + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm3, 192(%rsp) + movups %xmm0, 256(%rsp) + je .LBL_1_2 + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $2, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,8), %rdi + lea 256(%rsp,%r12,8), %rsi + call __svml_dexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN2v_exp2_sse4) + + .align 16,0x90 + +__svml_dexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + movsd (%rdi), %xmm5 + movsd %xmm5, -8(%rsp) + cmpl $32752, %edx + je .LBL_2_12 + movsd 1072+__dexp2_la__imldExp2HATab(%rip), %xmm0 + comisd %xmm5, %xmm0 + jbe .LBL_2_10 + comisd 1088+__dexp2_la__imldExp2HATab(%rip), %xmm5 + jbe .LBL_2_9 + movsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movaps %xmm5, %xmm3 + lea __dexp2_la__imldExp2HATab(%rip), %r10 + addsd %xmm5, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movl -24(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + lea 1(%r8,%r8), %r9d + movsd -16(%rsp), %xmm2 + lea (%r8,%r8), %edi + movsd 1064+__dexp2_la__imldExp2HATab(%rip), %xmm1 + subsd %xmm2, %xmm3 + mulsd %xmm3, %xmm1 + movsd (%r10,%rdi,8), %xmm4 + shrl $6, %ecx + addsd 1056+__dexp2_la__imldExp2HATab(%rip), %xmm1 + comisd 1080+__dexp2_la__imldExp2HATab(%rip), %xmm5 + mulsd %xmm3, %xmm1 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, -8(%rsp) + lea 1023(%rcx), %edx + addsd 1048+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1040+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1032+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd (%r10,%r9,8), %xmm1 + mulsd %xmm4, %xmm1 + addsd %xmm4, %xmm1 + jb .LBL_2_8 + andl $2047, %edx + cmpl $2046, %edx + ja .LBL_2_7 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1128+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_8: + addl $1123, %ecx + andl $2047, %ecx + movzwl -2(%rsp), %eax + shll $4, %ecx + andl $-32753, %eax + orl %ecx, %eax + movw %ax, -2(%rsp) + movl $4, %eax + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1136+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_9: + movsd 1096+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_10: + movsd 1104+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_12: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + mulsd %xmm5, %xmm5 + movsd %xmm5, (%rsi) + ret + +.LBL_2_14: + testl $1048575, -4(%rsp) + jne .LBL_2_13 + cmpl $0, -8(%rsp) + jne .LBL_2_13 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp2_cout_rare_internal,@function + .size __svml_dexp2_cout_rare_internal,.-__svml_dexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp2_data_internal: + .long 0 + .long 1072693248 + .long 2851812149 + .long 1072698941 + .long 1048019041 + .long 1072704666 + .long 3899555717 + .long 1072710421 + .long 3541402996 + .long 1072716208 + .long 702412510 + .long 1072722027 + .long 410360776 + .long 1072727877 + .long 3402036099 + .long 1072733758 + .long 1828292879 + .long 1072739672 + .long 728909815 + .long 1072745618 + .long 852742562 + .long 1072751596 + .long 2952712987 + .long 1072757606 + .long 3490863953 + .long 1072763649 + .long 3228316108 + .long 1072769725 + .long 2930322912 + .long 1072775834 + .long 3366293073 + .long 1072781976 + .long 1014845819 + .long 1072788152 + .long 948735466 + .long 1072794361 + .long 3949972341 + .long 1072800603 + .long 2214878420 + .long 1072806880 + .long 828946858 + .long 1072813191 + .long 586995997 + .long 1072819536 + .long 2288159958 + .long 1072825915 + .long 2440944790 + .long 1072832329 + .long 1853186616 + .long 1072838778 + .long 1337108031 + .long 1072845262 + .long 1709341917 + .long 1072851781 + .long 3790955393 + .long 1072858335 + .long 4112506593 + .long 1072864925 + .long 3504003472 + .long 1072871551 + .long 2799960843 + .long 1072878213 + .long 2839424854 + .long 1072884911 + .long 171030293 + .long 1072891646 + .long 4232894513 + .long 1072898416 + .long 2992903935 + .long 1072905224 + .long 1603444721 + .long 1072912069 + .long 926591435 + .long 1072918951 + .long 1829099622 + .long 1072925870 + .long 887463927 + .long 1072932827 + .long 3272845541 + .long 1072939821 + .long 1276261410 + .long 1072946854 + .long 78413852 + .long 1072953925 + .long 569847338 + .long 1072961034 + .long 3645941911 + .long 1072968181 + .long 1617004845 + .long 1072975368 + .long 3978100823 + .long 1072982593 + .long 3049340112 + .long 1072989858 + .long 4040676318 + .long 1072997162 + .long 3577096743 + .long 1073004506 + .long 2583551245 + .long 1073011890 + .long 1990012071 + .long 1073019314 + .long 2731501122 + .long 1073026778 + .long 1453150082 + .long 1073034283 + .long 3395129871 + .long 1073041828 + .long 917841882 + .long 1073049415 + .long 3566716925 + .long 1073057042 + .long 3712504873 + .long 1073064711 + .long 2321106615 + .long 1073072422 + .long 363667784 + .long 1073080175 + .long 3111574537 + .long 1073087969 + .long 2956612997 + .long 1073095806 + .long 885834528 + .long 1073103686 + .long 2186617381 + .long 1073111608 + .long 3561793907 + .long 1073119573 + .long 1719614413 + .long 1073127582 + .long 1963711167 + .long 1073135634 + .long 1013258799 + .long 1073143730 + .long 4182873220 + .long 1073151869 + .long 3907805044 + .long 1073160053 + .long 1218806132 + .long 1073168282 + .long 1447192521 + .long 1073176555 + .long 1339972927 + .long 1073184873 + .long 1944781191 + .long 1073193236 + .long 19972402 + .long 1073201645 + .long 919555682 + .long 1073210099 + .long 1413356050 + .long 1073218599 + .long 2571947539 + .long 1073227145 + .long 1176749997 + .long 1073235738 + .long 2604962541 + .long 1073244377 + .long 3649726105 + .long 1073253063 + .long 1110089947 + .long 1073261797 + .long 380978316 + .long 1073270578 + .long 2568320822 + .long 1073279406 + .long 194117574 + .long 1073288283 + .long 2966275557 + .long 1073297207 + .long 3418903055 + .long 1073306180 + .long 2682146384 + .long 1073315202 + .long 1892288442 + .long 1073324273 + .long 2191782032 + .long 1073333393 + .long 434316067 + .long 1073342563 + .long 2069751141 + .long 1073351782 + .long 3964284211 + .long 1073361051 + .long 2990417245 + .long 1073370371 + .long 321958744 + .long 1073379742 + .long 1434058175 + .long 1073389163 + .long 3218338682 + .long 1073398635 + .long 2572866477 + .long 1073408159 + .long 697153126 + .long 1073417735 + .long 3092190715 + .long 1073427362 + .long 2380618042 + .long 1073437042 + .long 4076559943 + .long 1073446774 + .long 815859274 + .long 1073456560 + .long 2420883922 + .long 1073466398 + .long 1540824585 + .long 1073476290 + .long 3716502172 + .long 1073486235 + .long 1610600570 + .long 1073496235 + .long 777507147 + .long 1073506289 + .long 2483480501 + .long 1073516397 + .long 3706687593 + .long 1073526560 + .long 1432208378 + .long 1073536779 + .long 1242007932 + .long 1073547053 + .long 135105010 + .long 1073557383 + .long 3707479175 + .long 1073567768 + .long 382305176 + .long 1073578211 + .long 64696965 + .long 1073588710 + .long 4076975200 + .long 1073599265 + .long 863738719 + .long 1073609879 + .long 351641897 + .long 1073620550 + .long 3884662774 + .long 1073631278 + .long 4224142467 + .long 1073642065 + .long 2728693978 + .long 1073652911 + .long 764307441 + .long 1073663816 + .long 3999357479 + .long 1073674779 + .long 929806999 + .long 1073685803 + .long 1533953344 + .long 1073696886 + .long 2912730644 + .long 1073708029 + .long 2174652632 + .long 1073719233 + .long 730821105 + .long 1073730498 + .long 0 + .long 1120403456 + .long 0 + .long 1120403456 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4277810821 + .long 1072049730 + .long 4277810821 + .long 1072049730 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4286761544 + .long 1070514109 + .long 4286761544 + .long 1070514109 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2978017349 + .long 1068264201 + .long 2978017349 + .long 1068264201 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1538336399 + .long 1065595563 + .long 1538336399 + .long 1065595563 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 127 + .long 0 + .long 127 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dexp2_data_internal,@object + .size __svml_dexp2_data_internal,1536 + .align 32 + +__dexp2_la__imldExp2HATab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 0 + .long 1121452032 + .long 4277811695 + .long 1072049730 + .long 4286751290 + .long 1070514109 + .long 3607585384 + .long 1068264200 + .long 871937163 + .long 1065595565 + .long 3302507530 + .long 1062590576 + .long 0 + .long 1083179008 + .long 0 + .long 3230658560 + .long 0 + .long 3230714880 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 967835648 + .long 0 + .long 0 + .type __dexp2_la__imldExp2HATab,@object + .size __dexp2_la__imldExp2HATab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S new file mode 100644 index 0000000000..51c5de1100 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized exp2, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4v_exp2 _ZGVdN4v_exp2_sse_wrapper +#include "../svml_d_exp24_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c new file mode 100644 index 0000000000..bb979afde6 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp2, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4v_exp2 +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_exp2, __GI__ZGVdN4v_exp2, __redirect__ZGVdN4v_exp2) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S new file mode 100644 index 0000000000..b1012c61ce --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S @@ -0,0 +1,1231 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp2(x) = 2^n * T[j] * (1 + P(y)) + * where + * x = m*(1/K) + y, y in [-1/K..1/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp2(x)-1 + * on small interval [-1/K..1/K] + * + * Special cases: + * + * exp2(NaN) = NaN + * exp2(+INF) = +INF + * exp2(-INF) = 0 + * exp2(x) = 1 for subnormals + * For IEEE double + * if x >= 1024.0 then exp2(x) overflows + * if x < -1076.0 then exp2(x) underflows + * + */ + +#include + + .text +ENTRY(_ZGVdN4v_exp2_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + +/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */ + lea __svml_dexp2_data_internal(%rip), %rax + vmovupd 1024+__svml_dexp2_data_internal(%rip), %ymm4 + vmovupd 1344+__svml_dexp2_data_internal(%rip), %ymm3 + vmovups %ymm13, 256(%rsp) + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + +/* + * Polynomial + * poly(dN) = a1*dR+...+a4*dR^4 + */ + vmovupd 1280+__svml_dexp2_data_internal(%rip), %ymm13 + vmovups %ymm8, 32(%rsp) + vmovups %ymm9, 96(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm11, 192(%rsp) + vmovups %ymm15, 320(%rsp) + vmovups %ymm14, 288(%rsp) + vmovapd %ymm0, %ymm1 + +/* Load arument */ + vaddpd %ymm4, %ymm1, %ymm2 + vsubpd %ymm4, %ymm2, %ymm0 + +/* Check for overflow\underflow */ + vextractf128 $1, %ymm1, %xmm5 + +/* R */ + vsubpd %ymm0, %ymm1, %ymm0 + vshufps $221, %xmm5, %xmm1, %xmm6 + +/* Index and lookup */ + vandps %ymm3, %ymm2, %ymm5 + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vfmadd213pd 1216+__svml_dexp2_data_internal(%rip), %ymm0, %ymm13 + vandps 1408+__svml_dexp2_data_internal(%rip), %xmm6, %xmm7 + vpcmpgtd 1472+__svml_dexp2_data_internal(%rip), %xmm7, %xmm4 + vmovmskps %xmm4, %edx + vpandn %ymm2, %ymm3, %ymm2 + vfmadd213pd 1152+__svml_dexp2_data_internal(%rip), %ymm0, %ymm13 + +/* 2^N */ + vpsllq $45, %ymm2, %ymm3 + +/* a1+...+a4*dR^3 ! */ + vfmadd213pd 1088+__svml_dexp2_data_internal(%rip), %ymm0, %ymm13 + vextractf128 $1, %ymm5, %xmm8 + vmovd %xmm5, %ecx + vmovd %xmm8, %r9d + shll $3, %ecx + vpextrd $2, %xmm5, %r8d + shll $3, %r9d + vpextrd $2, %xmm8, %r10d + shll $3, %r8d + vmovq (%rax,%rcx), %xmm6 + shll $3, %r10d + vmovq (%rax,%r9), %xmm9 + vmovhpd (%rax,%r8), %xmm6, %xmm7 + vmovhpd (%rax,%r10), %xmm9, %xmm10 + vinsertf128 $1, %xmm10, %ymm7, %ymm12 + +/* dR=dR*dT */ + vmulpd %ymm0, %ymm12, %ymm11 + +/* + * Reconstruction + * exp2 = {2^N later}*(Tj+Tj*poly) + * dN = dT+dT*dR*(a1+...+a4*dR^3) + */ + vfmadd213pd %ymm12, %ymm11, %ymm13 + +/* quick 2^N */ + vpaddq %ymm3, %ymm13, %ymm0 + +/* Finish */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovupd %ymm1, 64(%rsp) + vmovupd %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovupd 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN4v_exp2_avx2) + + .align 16,0x90 + +__svml_dexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + movsd (%rdi), %xmm5 + movsd %xmm5, -8(%rsp) + cmpl $32752, %edx + je .LBL_2_12 + movsd 1072+__dexp2_la__imldExp2HATab(%rip), %xmm0 + comisd %xmm5, %xmm0 + jbe .LBL_2_10 + comisd 1088+__dexp2_la__imldExp2HATab(%rip), %xmm5 + jbe .LBL_2_9 + movsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movaps %xmm5, %xmm3 + lea __dexp2_la__imldExp2HATab(%rip), %r10 + addsd %xmm5, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movl -24(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + lea 1(%r8,%r8), %r9d + movsd -16(%rsp), %xmm2 + lea (%r8,%r8), %edi + movsd 1064+__dexp2_la__imldExp2HATab(%rip), %xmm1 + subsd %xmm2, %xmm3 + mulsd %xmm3, %xmm1 + movsd (%r10,%rdi,8), %xmm4 + shrl $6, %ecx + addsd 1056+__dexp2_la__imldExp2HATab(%rip), %xmm1 + comisd 1080+__dexp2_la__imldExp2HATab(%rip), %xmm5 + mulsd %xmm3, %xmm1 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, -8(%rsp) + lea 1023(%rcx), %edx + addsd 1048+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1040+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1032+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd (%r10,%r9,8), %xmm1 + mulsd %xmm4, %xmm1 + addsd %xmm4, %xmm1 + jb .LBL_2_8 + andl $2047, %edx + cmpl $2046, %edx + ja .LBL_2_7 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1128+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_8: + addl $1123, %ecx + andl $2047, %ecx + movzwl -2(%rsp), %eax + shll $4, %ecx + andl $-32753, %eax + orl %ecx, %eax + movw %ax, -2(%rsp) + movl $4, %eax + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1136+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_9: + movsd 1096+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_10: + movsd 1104+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_12: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + mulsd %xmm5, %xmm5 + movsd %xmm5, (%rsi) + ret + +.LBL_2_14: + testl $1048575, -4(%rsp) + jne .LBL_2_13 + cmpl $0, -8(%rsp) + jne .LBL_2_13 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp2_cout_rare_internal,@function + .size __svml_dexp2_cout_rare_internal,.-__svml_dexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp2_data_internal: + .long 0 + .long 1072693248 + .long 2851812149 + .long 1072698941 + .long 1048019041 + .long 1072704666 + .long 3899555717 + .long 1072710421 + .long 3541402996 + .long 1072716208 + .long 702412510 + .long 1072722027 + .long 410360776 + .long 1072727877 + .long 3402036099 + .long 1072733758 + .long 1828292879 + .long 1072739672 + .long 728909815 + .long 1072745618 + .long 852742562 + .long 1072751596 + .long 2952712987 + .long 1072757606 + .long 3490863953 + .long 1072763649 + .long 3228316108 + .long 1072769725 + .long 2930322912 + .long 1072775834 + .long 3366293073 + .long 1072781976 + .long 1014845819 + .long 1072788152 + .long 948735466 + .long 1072794361 + .long 3949972341 + .long 1072800603 + .long 2214878420 + .long 1072806880 + .long 828946858 + .long 1072813191 + .long 586995997 + .long 1072819536 + .long 2288159958 + .long 1072825915 + .long 2440944790 + .long 1072832329 + .long 1853186616 + .long 1072838778 + .long 1337108031 + .long 1072845262 + .long 1709341917 + .long 1072851781 + .long 3790955393 + .long 1072858335 + .long 4112506593 + .long 1072864925 + .long 3504003472 + .long 1072871551 + .long 2799960843 + .long 1072878213 + .long 2839424854 + .long 1072884911 + .long 171030293 + .long 1072891646 + .long 4232894513 + .long 1072898416 + .long 2992903935 + .long 1072905224 + .long 1603444721 + .long 1072912069 + .long 926591435 + .long 1072918951 + .long 1829099622 + .long 1072925870 + .long 887463927 + .long 1072932827 + .long 3272845541 + .long 1072939821 + .long 1276261410 + .long 1072946854 + .long 78413852 + .long 1072953925 + .long 569847338 + .long 1072961034 + .long 3645941911 + .long 1072968181 + .long 1617004845 + .long 1072975368 + .long 3978100823 + .long 1072982593 + .long 3049340112 + .long 1072989858 + .long 4040676318 + .long 1072997162 + .long 3577096743 + .long 1073004506 + .long 2583551245 + .long 1073011890 + .long 1990012071 + .long 1073019314 + .long 2731501122 + .long 1073026778 + .long 1453150082 + .long 1073034283 + .long 3395129871 + .long 1073041828 + .long 917841882 + .long 1073049415 + .long 3566716925 + .long 1073057042 + .long 3712504873 + .long 1073064711 + .long 2321106615 + .long 1073072422 + .long 363667784 + .long 1073080175 + .long 3111574537 + .long 1073087969 + .long 2956612997 + .long 1073095806 + .long 885834528 + .long 1073103686 + .long 2186617381 + .long 1073111608 + .long 3561793907 + .long 1073119573 + .long 1719614413 + .long 1073127582 + .long 1963711167 + .long 1073135634 + .long 1013258799 + .long 1073143730 + .long 4182873220 + .long 1073151869 + .long 3907805044 + .long 1073160053 + .long 1218806132 + .long 1073168282 + .long 1447192521 + .long 1073176555 + .long 1339972927 + .long 1073184873 + .long 1944781191 + .long 1073193236 + .long 19972402 + .long 1073201645 + .long 919555682 + .long 1073210099 + .long 1413356050 + .long 1073218599 + .long 2571947539 + .long 1073227145 + .long 1176749997 + .long 1073235738 + .long 2604962541 + .long 1073244377 + .long 3649726105 + .long 1073253063 + .long 1110089947 + .long 1073261797 + .long 380978316 + .long 1073270578 + .long 2568320822 + .long 1073279406 + .long 194117574 + .long 1073288283 + .long 2966275557 + .long 1073297207 + .long 3418903055 + .long 1073306180 + .long 2682146384 + .long 1073315202 + .long 1892288442 + .long 1073324273 + .long 2191782032 + .long 1073333393 + .long 434316067 + .long 1073342563 + .long 2069751141 + .long 1073351782 + .long 3964284211 + .long 1073361051 + .long 2990417245 + .long 1073370371 + .long 321958744 + .long 1073379742 + .long 1434058175 + .long 1073389163 + .long 3218338682 + .long 1073398635 + .long 2572866477 + .long 1073408159 + .long 697153126 + .long 1073417735 + .long 3092190715 + .long 1073427362 + .long 2380618042 + .long 1073437042 + .long 4076559943 + .long 1073446774 + .long 815859274 + .long 1073456560 + .long 2420883922 + .long 1073466398 + .long 1540824585 + .long 1073476290 + .long 3716502172 + .long 1073486235 + .long 1610600570 + .long 1073496235 + .long 777507147 + .long 1073506289 + .long 2483480501 + .long 1073516397 + .long 3706687593 + .long 1073526560 + .long 1432208378 + .long 1073536779 + .long 1242007932 + .long 1073547053 + .long 135105010 + .long 1073557383 + .long 3707479175 + .long 1073567768 + .long 382305176 + .long 1073578211 + .long 64696965 + .long 1073588710 + .long 4076975200 + .long 1073599265 + .long 863738719 + .long 1073609879 + .long 351641897 + .long 1073620550 + .long 3884662774 + .long 1073631278 + .long 4224142467 + .long 1073642065 + .long 2728693978 + .long 1073652911 + .long 764307441 + .long 1073663816 + .long 3999357479 + .long 1073674779 + .long 929806999 + .long 1073685803 + .long 1533953344 + .long 1073696886 + .long 2912730644 + .long 1073708029 + .long 2174652632 + .long 1073719233 + .long 730821105 + .long 1073730498 + .long 0 + .long 1120403456 + .long 0 + .long 1120403456 + .long 0 + .long 1120403456 + .long 0 + .long 1120403456 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4277810821 + .long 1072049730 + .long 4277810821 + .long 1072049730 + .long 4277810821 + .long 1072049730 + .long 4277810821 + .long 1072049730 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4286761544 + .long 1070514109 + .long 4286761544 + .long 1070514109 + .long 4286761544 + .long 1070514109 + .long 4286761544 + .long 1070514109 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2978017349 + .long 1068264201 + .long 2978017349 + .long 1068264201 + .long 2978017349 + .long 1068264201 + .long 2978017349 + .long 1068264201 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1538336399 + .long 1065595563 + .long 1538336399 + .long 1065595563 + .long 1538336399 + .long 1065595563 + .long 1538336399 + .long 1065595563 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 127 + .long 0 + .long 127 + .long 0 + .long 127 + .long 0 + .long 127 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .long 1083174911 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dexp2_data_internal,@object + .size __svml_dexp2_data_internal,1536 + .align 32 + +__dexp2_la__imldExp2HATab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 0 + .long 1121452032 + .long 4277811695 + .long 1072049730 + .long 4286751290 + .long 1070514109 + .long 3607585384 + .long 1068264200 + .long 871937163 + .long 1065595565 + .long 3302507530 + .long 1062590576 + .long 0 + .long 1083179008 + .long 0 + .long 3230658560 + .long 0 + .long 3230714880 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 967835648 + .long 0 + .long 0 + .type __dexp2_la__imldExp2HATab,@object + .size __dexp2_la__imldExp2HATab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S new file mode 100644 index 0000000000..c9c17f0aaa --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized exp2, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8v_exp2 _ZGVeN8v_exp2_avx2_wrapper +#include "../svml_d_exp28_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c new file mode 100644 index 0000000000..3be9e88e98 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized exp2, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8v_exp2 +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_exp2, __GI__ZGVeN8v_exp2, __redirect__ZGVeN8v_exp2) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S new file mode 100644 index 0000000000..5bb1a06ab7 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S @@ -0,0 +1,829 @@ +/* Function exp vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Double precision mantissa represented as: 1.b1b2b3 ... b52 + * Constant for double precision: S = 2^48 x 1.5 + * + * 2^X = 2^Xo x 2^{X-Xo} + * 2^X = 2^K x 2^fo x 2^{X-Xo} + * 2^X = 2^K x 2^fo x 2^r + * + * 2^K --> Manual scaling + * 2^fo --> Table lookup + * r --> 1 + poly (r = X - Xo) + * + * Xo = K + fo + * Xo = K + 0.x1x2x3x4 + * + * r = X - Xo + * = Vreduce(X, imm) + * = X - VRndScale(X, imm), where Xo = VRndScale(X, imm) + * + * Rnd(S + X) = S + Xo, where S is selected as S = 2^19 x 1.5 + * S + X = S + floor(X) + 0.x1x2x3x4 + * Rnd(S + X) = Rnd(2^48 x 1.5 + X) + * (Note: 2^exp x 1.b1b2b3 ... b52, 2^{exp-52} = 2^-4 for exp=48) + * + * exp2(x) = 2^K x 2^fo x (1 + poly(r)), where 2^r = 1 + poly(r) + * + * Scale back: + * dest = src1 x 2^floor(src2) + * + * + */ + +#include + + .text +ENTRY(_ZGVeN8v_exp2_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 384+__svml_dexp2_data_internal_avx512(%rip), %zmm14 + vmovups 448+__svml_dexp2_data_internal_avx512(%rip), %zmm6 + +/* + * Reduced argument + * where VREDUCE is available + */ + vreducepd $65, {sae}, %zmm0, %zmm10 + vmovups 320+__svml_dexp2_data_internal_avx512(%rip), %zmm7 + vmovups 512+__svml_dexp2_data_internal_avx512(%rip), %zmm3 + vmovups 256+__svml_dexp2_data_internal_avx512(%rip), %zmm8 + vmovups __svml_dexp2_data_internal_avx512(%rip), %zmm13 + +/* c6*r + c5 */ + vfmadd231pd {rn-sae}, %zmm10, %zmm6, %zmm14 + vmovups 192+__svml_dexp2_data_internal_avx512(%rip), %zmm9 + vmovups 640+__svml_dexp2_data_internal_avx512(%rip), %zmm2 + +/* + * + * HA + * Variables and constants + * Load constants and vector(s) + */ + vmovups 128+__svml_dexp2_data_internal_avx512(%rip), %zmm11 + +/* c6*r^2 + c5*r + c4 */ + vfmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm14 + +/* + * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0 + * Mantisssa of normalized double precision FP: 1.b1b2...b52 + */ + vaddpd {rd-sae}, %zmm3, %zmm0, %zmm4 + vandpd 576+__svml_dexp2_data_internal_avx512(%rip), %zmm0, %zmm1 + +/* c6*r^3 + c5*r^2 + c4*r + c3 */ + vfmadd213pd {rn-sae}, %zmm8, %zmm10, %zmm14 + vcmppd $29, {sae}, %zmm2, %zmm1, %k0 + +/* c6*r^4 + c5*r^3 + c4*r^2 + c3*r + c2 */ + vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm14 + kmovw %k0, %edx + +/* c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1 */ + vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14 + +/* Table value: 2^(0.b1b2b3b4) */ + vpandq 704+__svml_dexp2_data_internal_avx512(%rip), %zmm4, %zmm5 + vpermt2pd 64+__svml_dexp2_data_internal_avx512(%rip), %zmm5, %zmm13 + +/* T*r */ + vmulpd {rn-sae}, %zmm10, %zmm13, %zmm12 + +/* T + (T*r*(c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1)) */ + vfmadd213pd {rn-sae}, %zmm13, %zmm12, %zmm14 + +/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */ + vscalefpd {rn-sae}, %zmm0, %zmm14, %zmm1 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm1, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm1, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm1 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN8v_exp2_skx) + + .align 16,0x90 + +__svml_dexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 6(%rdi), %edx + andl $32752, %edx + movsd (%rdi), %xmm5 + movsd %xmm5, -8(%rsp) + cmpl $32752, %edx + je .LBL_2_12 + movsd 1072+__dexp2_la__imldExp2HATab(%rip), %xmm0 + comisd %xmm5, %xmm0 + jbe .LBL_2_10 + comisd 1088+__dexp2_la__imldExp2HATab(%rip), %xmm5 + jbe .LBL_2_9 + movsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movaps %xmm5, %xmm3 + lea __dexp2_la__imldExp2HATab(%rip), %r10 + addsd %xmm5, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -24(%rsp), %xmm1 + movl -24(%rsp), %r8d + movl %r8d, %ecx + andl $63, %r8d + subsd 1024+__dexp2_la__imldExp2HATab(%rip), %xmm1 + movsd %xmm1, -16(%rsp) + lea 1(%r8,%r8), %r9d + movsd -16(%rsp), %xmm2 + lea (%r8,%r8), %edi + movsd 1064+__dexp2_la__imldExp2HATab(%rip), %xmm1 + subsd %xmm2, %xmm3 + mulsd %xmm3, %xmm1 + movsd (%r10,%rdi,8), %xmm4 + shrl $6, %ecx + addsd 1056+__dexp2_la__imldExp2HATab(%rip), %xmm1 + comisd 1080+__dexp2_la__imldExp2HATab(%rip), %xmm5 + mulsd %xmm3, %xmm1 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, -8(%rsp) + lea 1023(%rcx), %edx + addsd 1048+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1040+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd 1032+__dexp2_la__imldExp2HATab(%rip), %xmm1 + mulsd %xmm3, %xmm1 + addsd (%r10,%r9,8), %xmm1 + mulsd %xmm4, %xmm1 + addsd %xmm4, %xmm1 + jb .LBL_2_8 + andl $2047, %edx + cmpl $2046, %edx + ja .LBL_2_7 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rcx + shrq $48, %rcx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_7: + decl %edx + andl $2047, %edx + movzwl -2(%rsp), %ecx + shll $4, %edx + andl $-32753, %ecx + orl %edx, %ecx + movw %cx, -2(%rsp) + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1128+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_8: + addl $1123, %ecx + andl $2047, %ecx + movzwl -2(%rsp), %eax + shll $4, %ecx + andl $-32753, %eax + orl %ecx, %eax + movw %ax, -2(%rsp) + movl $4, %eax + movsd -8(%rsp), %xmm0 + mulsd %xmm1, %xmm0 + mulsd 1136+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_9: + movsd 1096+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $4, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_10: + movsd 1104+__dexp2_la__imldExp2HATab(%rip), %xmm0 + movl $3, %eax + mulsd %xmm0, %xmm0 + movsd %xmm0, (%rsi) + ret + +.LBL_2_12: + movb -1(%rsp), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_14 + +.LBL_2_13: + mulsd %xmm5, %xmm5 + movsd %xmm5, (%rsi) + ret + +.LBL_2_14: + testl $1048575, -4(%rsp) + jne .LBL_2_13 + cmpl $0, -8(%rsp) + jne .LBL_2_13 + movq 1112+__dexp2_la__imldExp2HATab(%rip), %rdx + movq %rdx, (%rsi) + ret + + cfi_endproc + + .type __svml_dexp2_cout_rare_internal,@function + .size __svml_dexp2_cout_rare_internal,.-__svml_dexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dexp2_data_internal_avx512: + .long 0 + .long 1072693248 + .long 1828292879 + .long 1072739672 + .long 1014845819 + .long 1072788152 + .long 1853186616 + .long 1072838778 + .long 171030293 + .long 1072891646 + .long 1276261410 + .long 1072946854 + .long 3577096743 + .long 1073004506 + .long 3712504873 + .long 1073064711 + .long 1719614413 + .long 1073127582 + .long 1944781191 + .long 1073193236 + .long 1110089947 + .long 1073261797 + .long 2191782032 + .long 1073333393 + .long 2572866477 + .long 1073408159 + .long 3716502172 + .long 1073486235 + .long 3707479175 + .long 1073567768 + .long 2728693978 + .long 1073652911 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4277811595 + .long 1072049730 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 4286862682 + .long 1070514109 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 3568142009 + .long 1068264200 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 454500946 + .long 1065595565 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 655609113 + .long 1062590279 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 4112922395 + .long 1059365335 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 0 + .long 1123549184 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 0 + .long 1083174911 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .long 15 + .long 0 + .type __svml_dexp2_data_internal_avx512,@object + .size __svml_dexp2_data_internal_avx512,768 + .align 32 + +__dexp2_la__imldExp2HATab: + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 1048019041 + .long 1072704666 + .long 2631457885 + .long 3161546771 + .long 3541402996 + .long 1072716208 + .long 896005651 + .long 1015861842 + .long 410360776 + .long 1072727877 + .long 1642514529 + .long 1012987726 + .long 1828292879 + .long 1072739672 + .long 1568897901 + .long 1016568486 + .long 852742562 + .long 1072751596 + .long 1882168529 + .long 1010744893 + .long 3490863953 + .long 1072763649 + .long 707771662 + .long 3163903570 + .long 2930322912 + .long 1072775834 + .long 3117806614 + .long 3163670819 + .long 1014845819 + .long 1072788152 + .long 3936719688 + .long 3162512149 + .long 3949972341 + .long 1072800603 + .long 1058231231 + .long 1015777676 + .long 828946858 + .long 1072813191 + .long 1044000608 + .long 1016786167 + .long 2288159958 + .long 1072825915 + .long 1151779725 + .long 1015705409 + .long 1853186616 + .long 1072838778 + .long 3819481236 + .long 1016499965 + .long 1709341917 + .long 1072851781 + .long 2552227826 + .long 1015039787 + .long 4112506593 + .long 1072864925 + .long 1829350193 + .long 1015216097 + .long 2799960843 + .long 1072878213 + .long 1913391796 + .long 1015756674 + .long 171030293 + .long 1072891646 + .long 1303423926 + .long 1015238005 + .long 2992903935 + .long 1072905224 + .long 1574172746 + .long 1016061241 + .long 926591435 + .long 1072918951 + .long 3427487848 + .long 3163704045 + .long 887463927 + .long 1072932827 + .long 1049900754 + .long 3161575912 + .long 1276261410 + .long 1072946854 + .long 2804567149 + .long 1015390024 + .long 569847338 + .long 1072961034 + .long 1209502043 + .long 3159926671 + .long 1617004845 + .long 1072975368 + .long 1623370769 + .long 1011049453 + .long 3049340112 + .long 1072989858 + .long 3667985273 + .long 1013894369 + .long 3577096743 + .long 1073004506 + .long 3145379760 + .long 1014403278 + .long 1990012071 + .long 1073019314 + .long 7447438 + .long 3163526196 + .long 1453150082 + .long 1073034283 + .long 3171891295 + .long 3162037958 + .long 917841882 + .long 1073049415 + .long 419288974 + .long 1016280325 + .long 3712504873 + .long 1073064711 + .long 3793507337 + .long 1016095713 + .long 363667784 + .long 1073080175 + .long 728023093 + .long 1016345318 + .long 2956612997 + .long 1073095806 + .long 1005538728 + .long 3163304901 + .long 2186617381 + .long 1073111608 + .long 2018924632 + .long 3163803357 + .long 1719614413 + .long 1073127582 + .long 3210617384 + .long 3163796463 + .long 1013258799 + .long 1073143730 + .long 3094194670 + .long 3160631279 + .long 3907805044 + .long 1073160053 + .long 2119843535 + .long 3161988964 + .long 1447192521 + .long 1073176555 + .long 508946058 + .long 3162904882 + .long 1944781191 + .long 1073193236 + .long 3108873501 + .long 3162190556 + .long 919555682 + .long 1073210099 + .long 2882956373 + .long 1013312481 + .long 2571947539 + .long 1073227145 + .long 4047189812 + .long 3163777462 + .long 2604962541 + .long 1073244377 + .long 3631372142 + .long 3163870288 + .long 1110089947 + .long 1073261797 + .long 3253791412 + .long 1015920431 + .long 2568320822 + .long 1073279406 + .long 1509121860 + .long 1014756995 + .long 2966275557 + .long 1073297207 + .long 2339118633 + .long 3160254904 + .long 2682146384 + .long 1073315202 + .long 586480042 + .long 3163702083 + .long 2191782032 + .long 1073333393 + .long 730975783 + .long 1014083580 + .long 2069751141 + .long 1073351782 + .long 576856675 + .long 3163014404 + .long 2990417245 + .long 1073370371 + .long 3552361237 + .long 3163667409 + .long 1434058175 + .long 1073389163 + .long 1853053619 + .long 1015310724 + .long 2572866477 + .long 1073408159 + .long 2462790535 + .long 1015814775 + .long 3092190715 + .long 1073427362 + .long 1457303226 + .long 3159737305 + .long 4076559943 + .long 1073446774 + .long 950899508 + .long 3160987380 + .long 2420883922 + .long 1073466398 + .long 174054861 + .long 1014300631 + .long 3716502172 + .long 1073486235 + .long 816778419 + .long 1014197934 + .long 777507147 + .long 1073506289 + .long 3507050924 + .long 1015341199 + .long 3706687593 + .long 1073526560 + .long 1821514088 + .long 1013410604 + .long 1242007932 + .long 1073547053 + .long 1073740399 + .long 3163532637 + .long 3707479175 + .long 1073567768 + .long 2789017511 + .long 1014276997 + .long 64696965 + .long 1073588710 + .long 3586233004 + .long 1015962192 + .long 863738719 + .long 1073609879 + .long 129252895 + .long 3162690849 + .long 3884662774 + .long 1073631278 + .long 1614448851 + .long 1014281732 + .long 2728693978 + .long 1073652911 + .long 2413007344 + .long 3163551506 + .long 3999357479 + .long 1073674779 + .long 1101668360 + .long 1015989180 + .long 1533953344 + .long 1073696886 + .long 835814894 + .long 1015702697 + .long 2174652632 + .long 1073719233 + .long 1301400989 + .long 1014466875 + .long 0 + .long 1121452032 + .long 4277811695 + .long 1072049730 + .long 4286751290 + .long 1070514109 + .long 3607585384 + .long 1068264200 + .long 871937163 + .long 1065595565 + .long 3302507530 + .long 1062590576 + .long 0 + .long 1083179008 + .long 0 + .long 3230658560 + .long 0 + .long 3230714880 + .long 1 + .long 1048576 + .long 4294967295 + .long 2146435071 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 0 + .long 967835648 + .long 0 + .long 0 + .type __dexp2_la__imldExp2HATab,@object + .size __dexp2_la__imldExp2HATab,1152 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S new file mode 100644 index 0000000000..4daa687852 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized exp2f. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16v_exp2f _ZGVeN16v_exp2f_avx2_wrapper +#include "../svml_s_exp2f16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c new file mode 100644 index 0000000000..e90d9d8684 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp2f, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16v_exp2f +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_exp2f, __GI__ZGVeN16v_exp2f, + __redirect__ZGVeN16v_exp2f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S new file mode 100644 index 0000000000..7848ebca87 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S @@ -0,0 +1,511 @@ +/* Function expf16 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * Single precision mantissa represented as: 1.b1b2b3 ... b23 + * Constant for single precision: S = 2^19 x 1.5 + * + * 2^X = 2^Xo x 2^{X-Xo} + * 2^X = 2^K x 2^fo x 2^{X-Xo} + * 2^X = 2^K x 2^fo x 2^r + * + * 2^K --> Manual scaling + * 2^fo --> Table lookup + * r --> 1 + poly (r = X - Xo) + * + * Xo = K + fo + * Xo = K + 0.x1x2x3x4 + * + * r = X - Xo + * = Vreduce(X, imm) + * = X - VRndScale(X, imm), where Xo = VRndScale(X, imm) + * + * Rnd(S + X) = S + Xo, where S is selected as S = 2^19 x 1.5 + * S + X = S + floor(X) + 0.x1x2x3x4 + * Rnd(S + X) = Rnd(2^19 x 1.5 + X) + * (Note: 2^exp x 1.b1b2b3 ... b23, 2^{exp-23} = 2^-4 for exp=19) + * + * exp2(x) = 2^K x 2^fo x (1 + poly(r)), where 2^r = 1 + poly(r) + * + * Scale back: + * dest = src1 x 2^floor(src2) + * + * + */ + +#include + + .text +ENTRY(_ZGVeN16v_exp2f_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $192, %rsp + vmovups 256+__svml_sexp2_data_internal_avx512(%rip), %zmm3 + +/* + * Reduced argument + * where VREDUCE is available + */ + vreduceps $65, {sae}, %zmm0, %zmm6 + vmovups 192+__svml_sexp2_data_internal_avx512(%rip), %zmm5 + vmovups 128+__svml_sexp2_data_internal_avx512(%rip), %zmm10 + vmovups 384+__svml_sexp2_data_internal_avx512(%rip), %zmm2 + +/* + * + * HA + * Variables and constants + * Load constants and vector(s) + */ + vmovups 64+__svml_sexp2_data_internal_avx512(%rip), %zmm7 + +/* + * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0 + * Mantisssa of normalized single precision FP: 1.b1b2...b23 + */ + vaddps {rd-sae}, %zmm3, %zmm0, %zmm4 + vandps 320+__svml_sexp2_data_internal_avx512(%rip), %zmm0, %zmm1 + +/* c3*r + c2 */ + vfmadd231ps {rn-sae}, %zmm6, %zmm5, %zmm10 + vcmpps $30, {sae}, %zmm2, %zmm1, %k0 + +/* c3*r^2 + c2*r + c1 */ + vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm10 + +/* Table value: 2^(0.b1b2b3b4) */ + vpermps __svml_sexp2_data_internal_avx512(%rip), %zmm4, %zmm9 + kmovw %k0, %edx + +/* T*r */ + vmulps {rn-sae}, %zmm6, %zmm9, %zmm8 + +/* T + (T*r*(c3*r^2 + c2*r + c1) */ + vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm10 + +/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */ + vscalefps {rn-sae}, %zmm0, %zmm10, %zmm1 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm1, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm1, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $16, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 128(%rsp), %zmm1 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x68, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x78, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x70, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_sexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN16v_exp2f_skx) + + .align 16,0x90 + +__svml_sexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_10 + movss (%rdi), %xmm7 + movss .FLT_43(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm7, %xmm0 + jb .LBL_2_8 + comiss .FLT_44(%rip), %xmm7 + jb .LBL_2_7 + movaps %xmm7, %xmm0 + movaps %xmm7, %xmm5 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_46(%rip), %xmm2 + movss .FLT_52(%rip), %xmm6 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %edx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + subss %xmm4, %xmm5 + mulss %xmm5, %xmm6 + addss .FLT_51(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_50(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_49(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_48(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_47(%rip), %xmm6 + movss %xmm6, -16(%rsp) + cmpl $104, %edx + jl .LBL_2_6 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_54(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_6: + addl $-106, %edx + cmpltss .FLT_56(%rip), %xmm7 + movzbl %dl, %edx + movzwl -10(%rsp), %eax + shll $7, %edx + andl $-32641, %eax + orl %edx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + mulss -12(%rsp), %xmm0 + movd %xmm7, %eax + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + andl $4, %eax + mulss .FLT_55(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + movss .FLT_53(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_8: + movss .FLT_45(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_10: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_12 + +.LBL_2_11: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_12: + testl $8388607, (%rdi) + jne .LBL_2_11 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp2_cout_rare_internal,@function + .size __svml_sexp2_cout_rare_internal,.-__svml_sexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp2_data_internal_avx512: + .long 1065353216 + .long 1065724611 + .long 1066112450 + .long 1066517459 + .long 1066940400 + .long 1067382066 + .long 1067843287 + .long 1068324927 + .long 1068827891 + .long 1069353124 + .long 1069901610 + .long 1070474380 + .long 1071072509 + .long 1071697119 + .long 1072349383 + .long 1073030525 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1060205090 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1047916907 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1030247626 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 1228931072 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .type __svml_sexp2_data_internal_avx512,@object + .size __svml_sexp2_data_internal_avx512,448 + .align 4 + +.FLT_43: + .long 0x43000000 + .type .FLT_43,@object + .size .FLT_43,4 + .align 4 + +.FLT_44: + .long 0xc3160000 + .type .FLT_44,@object + .size .FLT_44,4 + .align 4 + +.FLT_45: + .long 0x7f7fffff + .type .FLT_45,@object + .size .FLT_45,4 + .align 4 + +.FLT_46: + .long 0x4b400000 + .type .FLT_46,@object + .size .FLT_46,4 + .align 4 + +.FLT_47: + .long 0x3f800001 + .type .FLT_47,@object + .size .FLT_47,4 + .align 4 + +.FLT_48: + .long 0x3f317219 + .type .FLT_48,@object + .size .FLT_48,4 + .align 4 + +.FLT_49: + .long 0x3e75fc83 + .type .FLT_49,@object + .size .FLT_49,4 + .align 4 + +.FLT_50: + .long 0x3d635716 + .type .FLT_50,@object + .size .FLT_50,4 + .align 4 + +.FLT_51: + .long 0x3c1e883d + .type .FLT_51,@object + .size .FLT_51,4 + .align 4 + +.FLT_52: + .long 0x3aafc483 + .type .FLT_52,@object + .size .FLT_52,4 + .align 4 + +.FLT_53: + .long 0x00000001 + .type .FLT_53,@object + .size .FLT_53,4 + .align 4 + +.FLT_54: + .long 0x7f000000 + .type .FLT_54,@object + .size .FLT_54,4 + .align 4 + +.FLT_55: + .long 0x34000000 + .type .FLT_55,@object + .size .FLT_55,4 + .align 4 + +.FLT_56: + .long 0xc2fc0000 + .type .FLT_56,@object + .size .FLT_56,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S new file mode 100644 index 0000000000..0b3fec834c --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized exp2f, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4v_exp2f _ZGVbN4v_exp2f_sse2 +#include "../svml_s_exp2f4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c new file mode 100644 index 0000000000..db47118d97 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp2f, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4v_exp2f +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_exp2f, __GI__ZGVbN4v_exp2f, + __redirect__ZGVbN4v_exp2f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S new file mode 100644 index 0000000000..4c3e43be97 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S @@ -0,0 +1,923 @@ +/* Function expf4 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp2(x) = 2^n * T[j] * (1 + P(y)) + * where + * x = m*(1/K) + y, y in [-1/K..1/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp2(x)-1 + * on small interval [-1/K..1/K] + * + * Special cases: + * + * exp2(NaN) = NaN + * exp2(+INF) = +INF + * exp2(-INF) = 0 + * exp2(x) = 1 for subnormals + * For IEEE float + * if x >= 128.0 then exp2f(x) overflow + * if x < -151.0 then exp2f(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVbN4v_exp2f_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + +/* Check for overflow\underflow */ + movups __svml_sexp2_data_internal(%rip), %xmm1 + +/* Implementation */ + movaps %xmm1, %xmm5 + +/* Polynomial */ + movups 448+__svml_sexp2_data_internal(%rip), %xmm4 + addps %xmm0, %xmm5 + movaps %xmm5, %xmm3 + +/* 2^N */ + pslld $23, %xmm5 + +/* Check for overflow\underflow */ + movdqu 512+__svml_sexp2_data_internal(%rip), %xmm2 + subps %xmm1, %xmm3 + +/* R */ + movaps %xmm0, %xmm1 + pand %xmm0, %xmm2 + pcmpgtd 576+__svml_sexp2_data_internal(%rip), %xmm2 + subps %xmm3, %xmm1 + movmskps %xmm2, %edx + mulps %xmm1, %xmm4 + addps 384+__svml_sexp2_data_internal(%rip), %xmm4 + mulps %xmm1, %xmm4 + addps 320+__svml_sexp2_data_internal(%rip), %xmm4 + mulps %xmm1, %xmm4 + addps 256+__svml_sexp2_data_internal(%rip), %xmm4 + mulps %xmm1, %xmm4 + addps 192+__svml_sexp2_data_internal(%rip), %xmm4 + mulps %xmm1, %xmm4 + addps 128+__svml_sexp2_data_internal(%rip), %xmm4 + mulps %xmm4, %xmm1 + addps 64+__svml_sexp2_data_internal(%rip), %xmm1 + +/* Reconstruction */ + paddd %xmm5, %xmm1 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movaps %xmm1, %xmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm0, 192(%rsp) + movups %xmm1, 256(%rsp) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm1 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,4), %rdi + lea 256(%rsp,%r12,4), %rsi + call __svml_sexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN4v_exp2f_sse4) + + .align 16,0x90 + +__svml_sexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_10 + movss (%rdi), %xmm7 + movss .FLT_46(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm7, %xmm0 + jb .LBL_2_8 + comiss .FLT_47(%rip), %xmm7 + jb .LBL_2_7 + movaps %xmm7, %xmm0 + movaps %xmm7, %xmm5 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_49(%rip), %xmm2 + movss .FLT_55(%rip), %xmm6 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %edx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + subss %xmm4, %xmm5 + mulss %xmm5, %xmm6 + addss .FLT_54(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_53(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_52(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_51(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_50(%rip), %xmm6 + movss %xmm6, -16(%rsp) + cmpl $104, %edx + jl .LBL_2_6 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_57(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_6: + addl $-106, %edx + cmpltss .FLT_59(%rip), %xmm7 + movzbl %dl, %edx + movzwl -10(%rsp), %eax + shll $7, %edx + andl $-32641, %eax + orl %edx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + mulss -12(%rsp), %xmm0 + movd %xmm7, %eax + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + andl $4, %eax + mulss .FLT_58(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + movss .FLT_56(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_8: + movss .FLT_48(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_10: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_12 + +.LBL_2_11: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_12: + testl $8388607, (%rdi) + jne .LBL_2_11 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp2_cout_rare_internal,@function + .size __svml_sexp2_cout_rare_internal,.-__svml_sexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp2_data_internal: + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_sexp2_data_internal,@object + .size __svml_sexp2_data_internal,640 + .align 4 + +.FLT_46: + .long 0x43000000 + .type .FLT_46,@object + .size .FLT_46,4 + .align 4 + +.FLT_47: + .long 0xc3160000 + .type .FLT_47,@object + .size .FLT_47,4 + .align 4 + +.FLT_48: + .long 0x7f7fffff + .type .FLT_48,@object + .size .FLT_48,4 + .align 4 + +.FLT_49: + .long 0x4b400000 + .type .FLT_49,@object + .size .FLT_49,4 + .align 4 + +.FLT_50: + .long 0x3f800001 + .type .FLT_50,@object + .size .FLT_50,4 + .align 4 + +.FLT_51: + .long 0x3f317219 + .type .FLT_51,@object + .size .FLT_51,4 + .align 4 + +.FLT_52: + .long 0x3e75fc83 + .type .FLT_52,@object + .size .FLT_52,4 + .align 4 + +.FLT_53: + .long 0x3d635716 + .type .FLT_53,@object + .size .FLT_53,4 + .align 4 + +.FLT_54: + .long 0x3c1e883d + .type .FLT_54,@object + .size .FLT_54,4 + .align 4 + +.FLT_55: + .long 0x3aafc483 + .type .FLT_55,@object + .size .FLT_55,4 + .align 4 + +.FLT_56: + .long 0x00000001 + .type .FLT_56,@object + .size .FLT_56,4 + .align 4 + +.FLT_57: + .long 0x7f000000 + .type .FLT_57,@object + .size .FLT_57,4 + .align 4 + +.FLT_58: + .long 0x34000000 + .type .FLT_58,@object + .size .FLT_58,4 + .align 4 + +.FLT_59: + .long 0xc2fc0000 + .type .FLT_59,@object + .size .FLT_59,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S new file mode 100644 index 0000000000..4da2278ed8 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized exp2f, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8v_exp2f _ZGVdN8v_exp2f_sse_wrapper +#include "../svml_s_exp2f8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c new file mode 100644 index 0000000000..dc34671263 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized exp2f, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8v_exp2f +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_exp2f, __GI__ZGVdN8v_exp2f, + __redirect__ZGVdN8v_exp2f) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S new file mode 100644 index 0000000000..d42dc0abc5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S @@ -0,0 +1,795 @@ +/* Function expf8 vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * exp2(x) = 2^n * T[j] * (1 + P(y)) + * where + * x = m*(1/K) + y, y in [-1/K..1/K] + * m = n*K + j, m,n,j - signed integer, j in [-K/2..K/2] + * + * values of 2^j/K are tabulated + * + * P(y) is a minimax polynomial approximation of exp2(x)-1 + * on small interval [-1/K..1/K] + * + * Special cases: + * + * exp2(NaN) = NaN + * exp2(+INF) = +INF + * exp2(-INF) = 0 + * exp2(x) = 1 for subnormals + * For IEEE float + * if x >= 128.0 then exp2f(x) overflow + * if x < -151.0 then exp2f(x) underflow + * + */ + +#include + + .text +ENTRY(_ZGVdN8v_exp2f_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + +/* Check for overflow\underflow */ + vmovups __svml_sexp2_data_internal(%rip), %ymm1 + vmovups %ymm8, 32(%rsp) + +/* Polynomial */ + vmovups 448+__svml_sexp2_data_internal(%rip), %ymm7 + vmovups %ymm15, 320(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm13, 256(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm11, 192(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm9, 96(%rsp) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vmovaps %ymm0, %ymm8 + +/* Implementation */ + vaddps %ymm1, %ymm8, %ymm6 + vsubps %ymm1, %ymm6, %ymm4 + +/* 2^N */ + vpslld $23, %ymm6, %ymm0 + +/* R */ + vsubps %ymm4, %ymm8, %ymm5 + vfmadd213ps 384+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + vfmadd213ps 320+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + vfmadd213ps 256+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + vfmadd213ps 192+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + vfmadd213ps 128+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + vfmadd213ps 64+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7 + +/* Check for overflow\underflow */ + vandps 512+__svml_sexp2_data_internal(%rip), %ymm8, %ymm2 + vpcmpgtd 576+__svml_sexp2_data_internal(%rip), %ymm2, %ymm3 + vmovmskps %ymm3, %edx + +/* Reconstruction */ + vpaddd %ymm0, %ymm7, %ymm0 + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovups %ymm8, 64(%rsp) + vmovups %ymm0, 128(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovups 128(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_sexp2_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN8v_exp2f_avx2) + + .align 16,0x90 + +__svml_sexp2_cout_rare_internal: + + cfi_startproc + + xorl %eax, %eax + movzwl 2(%rdi), %edx + andl $32640, %edx + cmpl $32640, %edx + je .LBL_2_10 + movss (%rdi), %xmm7 + movss .FLT_46(%rip), %xmm0 + movl %eax, -12(%rsp) + comiss %xmm7, %xmm0 + jb .LBL_2_8 + comiss .FLT_47(%rip), %xmm7 + jb .LBL_2_7 + movaps %xmm7, %xmm0 + movaps %xmm7, %xmm5 + movss %xmm0, -24(%rsp) + movss -24(%rsp), %xmm1 + movss .FLT_49(%rip), %xmm2 + movss .FLT_55(%rip), %xmm6 + addss %xmm2, %xmm1 + movss %xmm1, -20(%rsp) + movss -20(%rsp), %xmm3 + movswl -20(%rsp), %edx + subss %xmm2, %xmm3 + movss %xmm3, -24(%rsp) + movss -24(%rsp), %xmm4 + subss %xmm4, %xmm5 + mulss %xmm5, %xmm6 + addss .FLT_54(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_53(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_52(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_51(%rip), %xmm6 + mulss %xmm5, %xmm6 + addss .FLT_50(%rip), %xmm6 + movss %xmm6, -16(%rsp) + cmpl $104, %edx + jl .LBL_2_6 + movzbl %dl, %edx + shll $7, %edx + movw %dx, -10(%rsp) + movss -16(%rsp), %xmm0 + mulss -12(%rsp), %xmm0 + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + mulss .FLT_57(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_6: + addl $-106, %edx + cmpltss .FLT_59(%rip), %xmm7 + movzbl %dl, %edx + movzwl -10(%rsp), %eax + shll $7, %edx + andl $-32641, %eax + orl %edx, %eax + movss -16(%rsp), %xmm0 + movw %ax, -10(%rsp) + mulss -12(%rsp), %xmm0 + movd %xmm7, %eax + movss %xmm0, -16(%rsp) + movss -16(%rsp), %xmm1 + andl $4, %eax + mulss .FLT_58(%rip), %xmm1 + movss %xmm1, -16(%rsp) + movl -16(%rsp), %ecx + movl %ecx, (%rsi) + ret + +.LBL_2_7: + movss .FLT_56(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $4, %eax + ret + +.LBL_2_8: + movss .FLT_48(%rip), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, -16(%rsp) + movl -16(%rsp), %eax + movl %eax, (%rsi) + movl $3, %eax + ret + +.LBL_2_10: + movb 3(%rdi), %dl + andb $-128, %dl + cmpb $-128, %dl + je .LBL_2_12 + +.LBL_2_11: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%rsi) + ret + +.LBL_2_12: + testl $8388607, (%rdi) + jne .LBL_2_11 + movl %eax, (%rsi) + ret + + cfi_endproc + + .type __svml_sexp2_cout_rare_internal,@function + .size __svml_sexp2_cout_rare_internal,.-__svml_sexp2_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_sexp2_data_internal: + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .long 1060205080 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .long 1047920111 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .long 1029920719 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .long 1008571948 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .long 984578641 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .long 958479500 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .long 1123811328 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_sexp2_data_internal,@object + .size __svml_sexp2_data_internal,640 + .align 4 + +.FLT_46: + .long 0x43000000 + .type .FLT_46,@object + .size .FLT_46,4 + .align 4 + +.FLT_47: + .long 0xc3160000 + .type .FLT_47,@object + .size .FLT_47,4 + .align 4 + +.FLT_48: + .long 0x7f7fffff + .type .FLT_48,@object + .size .FLT_48,4 + .align 4 + +.FLT_49: + .long 0x4b400000 + .type .FLT_49,@object + .size .FLT_49,4 + .align 4 + +.FLT_50: + .long 0x3f800001 + .type .FLT_50,@object + .size .FLT_50,4 + .align 4 + +.FLT_51: + .long 0x3f317219 + .type .FLT_51,@object + .size .FLT_51,4 + .align 4 + +.FLT_52: + .long 0x3e75fc83 + .type .FLT_52,@object + .size .FLT_52,4 + .align 4 + +.FLT_53: + .long 0x3d635716 + .type .FLT_53,@object + .size .FLT_53,4 + .align 4 + +.FLT_54: + .long 0x3c1e883d + .type .FLT_54,@object + .size .FLT_54,4 + .align 4 + +.FLT_55: + .long 0x3aafc483 + .type .FLT_55,@object + .size .FLT_55,4 + .align 4 + +.FLT_56: + .long 0x00000001 + .type .FLT_56,@object + .size .FLT_56,4 + .align 4 + +.FLT_57: + .long 0x7f000000 + .type .FLT_57,@object + .size .FLT_57,4 + .align 4 + +.FLT_58: + .long 0x34000000 + .type .FLT_58,@object + .size .FLT_58,4 + .align 4 + +.FLT_59: + .long 0xc2fc0000 + .type .FLT_59,@object + .size .FLT_59,4 diff --git a/sysdeps/x86_64/fpu/svml_d_exp22_core.S b/sysdeps/x86_64/fpu/svml_d_exp22_core.S new file mode 100644 index 0000000000..f03080a977 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp22_core.S @@ -0,0 +1,29 @@ +/* Function exp2 vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_exp2) +WRAPPER_IMPL_SSE2 exp2 +END (_ZGVbN2v_exp2) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2v_exp2) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core.S b/sysdeps/x86_64/fpu/svml_d_exp24_core.S new file mode 100644 index 0000000000..40475c7a94 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp24_core.S @@ -0,0 +1,29 @@ +/* Function exp2 vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_exp2) +WRAPPER_IMPL_AVX _ZGVbN2v_exp2 +END (_ZGVdN4v_exp2) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4v_exp2) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S new file mode 100644 index 0000000000..a7d22409df --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S @@ -0,0 +1,25 @@ +/* Function exp2 vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_exp2) +WRAPPER_IMPL_AVX _ZGVbN2v_exp2 +END (_ZGVcN4v_exp2) diff --git a/sysdeps/x86_64/fpu/svml_d_exp28_core.S b/sysdeps/x86_64/fpu/svml_d_exp28_core.S new file mode 100644 index 0000000000..f68aaed427 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_exp28_core.S @@ -0,0 +1,25 @@ +/* Function exp2 vectorized with AVX-512, wrapper to AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_exp2) +WRAPPER_IMPL_AVX512 _ZGVdN4v_exp2 +END (_ZGVeN8v_exp2) diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S new file mode 100644 index 0000000000..8ba4e82272 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S @@ -0,0 +1,25 @@ +/* Function exp2f vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_exp2f) +WRAPPER_IMPL_AVX512 _ZGVdN8v_exp2f +END (_ZGVeN16v_exp2f) diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S new file mode 100644 index 0000000000..916f176dca --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S @@ -0,0 +1,29 @@ +/* Function exp2f vectorized with SSE2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_exp2f) +WRAPPER_IMPL_SSE2 exp2f +END (_ZGVbN4v_exp2f) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4v_exp2f) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S new file mode 100644 index 0000000000..b8821b952b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S @@ -0,0 +1,29 @@ +/* Function exp2f vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_exp2f) +WRAPPER_IMPL_AVX _ZGVbN4v_exp2f +END (_ZGVdN8v_exp2f) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8v_exp2f) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S new file mode 100644 index 0000000000..ddaaf3b59a --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S @@ -0,0 +1,25 @@ +/* Function exp2f vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVcN8v_exp2f) +WRAPPER_IMPL_AVX _ZGVbN4v_exp2f +END (_ZGVcN8v_exp2f) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c new file mode 100644 index 0000000000..341ec99724 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp2.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c new file mode 100644 index 0000000000..341ec99724 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp2.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c new file mode 100644 index 0000000000..341ec99724 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-exp2.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c new file mode 100644 index 0000000000..b3b04f63e4 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC exp2 +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index 90eb5c3205..626d486baf 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVbN2v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10) +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index 5a960d003e..16b4160910 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVdN4v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10) +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index 37d9ce9cb7..76e2acf8c9 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVcN4v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10) +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index d61ab42881..ee71110aca 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh) VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf) VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVeN8v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10) +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c new file mode 100644 index 0000000000..0281d386fb --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp2f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c new file mode 100644 index 0000000000..0281d386fb --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp2f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c new file mode 100644 index 0000000000..0281d386fb --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-exp2f.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c new file mode 100644 index 0000000000..bf57661bee --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC exp2f +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index 1470b4d890..b9a5abb780 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVeN16v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f) +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index 7833482006..f25757a975 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVbN4v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f) +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index 6719c1ae56..c589504eb3 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVdN8v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f) +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index 40b852aad6..dc116f501d 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf) VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff) VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVcN8v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f) +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f) #define VEC_INT_TYPE __m128i