From patchwork Thu Dec 30 00:00:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 49403 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58FB8385802B for ; Thu, 30 Dec 2021 00:00:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58FB8385802B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1640822429; bh=hfa/LD8Wb4hU+cwDBtc81rs/Xo3wSSPr/XZzNd8nN+M=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=l4Qd5jIZa2exkZcKLDPUTothuKbGWKPw9AOoyGAjxo36ot/+EGyniodVMoK5IBQjF cBAqGwnr40uik39doAhPMCTKh2tu7JUNSrYC66FfUbEXALOu49wcWZq7Ca6nOkWFoJ SKkpU6/GidE5JnCWWCr5cxsHCcjk2jhgu5nEENBY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 742DA3858400 for ; Thu, 30 Dec 2021 00:00:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 742DA3858400 X-IronPort-AV: E=McAfee;i="6200,9189,10212"; a="241390467" X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="241390467" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2021 16:00:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="468650695" Received: from scymds02.sc.intel.com ([10.82.73.244]) by orsmga003.jf.intel.com with ESMTP; 29 Dec 2021 16:00:05 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1BU004KX008284; Wed, 29 Dec 2021 16:00:04 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v3 0/1] Add vector math function erfc/erfcf to libmvec Date: Wed, 29 Dec 2021 16:00:03 -0800 Message-Id: <20211230000004.3894570-1-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_SOFTFAIL, SPOOFED_FREEMAIL, SPOOF_GMAIL_MID, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com, marius.cornea@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch may looks big but 82% of this patch is data table. Changes from v2: - Replace big negative rip offset with Table Lookup Bias. - Remove more unused data table fields. - Include LOE(live on exit) register info. - Apply more peephole optimization. - Optimize load of all bits set into ZMM register - Replace 3 kmovw + andl with kandw instruction. - Restructure data table and remove unused fields. - Fix data table and field alignment according to ISA. - Fix data offset according to ISA. - Remove exit call dead code. - Remove unnecessary save/restore. - Keep cfi_escape for callee saved registers only. - Add DW_CFA_expression comments corresponding to each cfi_escape. - Define macro corresponding to each numeric data table offset. - Replace numeric data table offset with macro name. - Add data table structure definition as comments. - Restructure data table and add comments to each data field value. - Rename numeric sequential labels with meaningful label name. - Add more comments to labels as well as on call sites. - Internal special value processing paths replaced by calls to standard scalar math functions, makes code more compact and aligned with previous libmvec submission. Changes from v1: - Add ISA specific sections for all libmvec functions. - Add libmvec functions to math-vector-fortran.h. - Change label to sequential. - Fix function name in GNU header plate. This patch implements erfc/erfcf vector math functions containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests with regenerated ulps. Sunil K Pandey (1): x86-64: Add vector erfc/erfcf implementation to libmvec bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_erfc2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_erfc2_core.c | 27 + .../fpu/multiarch/svml_d_erfc2_core_sse4.S | 3853 ++++++++++++++++ .../fpu/multiarch/svml_d_erfc4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_erfc4_core.c | 27 + .../fpu/multiarch/svml_d_erfc4_core_avx2.S | 3857 ++++++++++++++++ .../fpu/multiarch/svml_d_erfc8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_erfc8_core.c | 27 + .../fpu/multiarch/svml_d_erfc8_core_avx512.S | 3860 +++++++++++++++++ .../fpu/multiarch/svml_s_erfcf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_erfcf16_core.c | 28 + .../multiarch/svml_s_erfcf16_core_avx512.S | 932 ++++ .../fpu/multiarch/svml_s_erfcf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_erfcf4_core.c | 28 + .../fpu/multiarch/svml_s_erfcf4_core_sse4.S | 939 ++++ .../fpu/multiarch/svml_s_erfcf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_erfcf8_core.c | 28 + .../fpu/multiarch/svml_s_erfcf8_core_avx2.S | 957 ++++ sysdeps/x86_64/fpu/svml_d_erfc2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_erfc4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_erfc8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_erfcf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_erfcf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_erfcf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-erfc-avx.c | 1 + .../fpu/test-double-libmvec-erfc-avx2.c | 1 + .../fpu/test-double-libmvec-erfc-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-erfc.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-erfcf-avx.c | 1 + .../fpu/test-float-libmvec-erfcf-avx2.c | 1 + .../fpu/test-float-libmvec-erfcf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 50 files changed, 14970 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c