From patchwork Thu Dec 30 00:03:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 49405 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A856E3858038 for ; Thu, 30 Dec 2021 00:04:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A856E3858038 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1640822671; bh=M0iIMk9CqZ15ZM49yDlfe4ht/u/t+KaG7Z3hQKU76tY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=QXhw/NwdRw8tspk6vMlqSbjxLBsAgmi3dUJbLQnQKmbbIZ2BollDw+gIH4Oaifso5 9M8NZB9czfhzkntHNH19KQpqqya/8ZJ44ptBhNtYgCRzV1mCJ6w1S/WbGqH267kuFV hOe0pFU+VR/72E9P3lsEA/Wn3CXrXx5RFDoVQWCE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id C450F385802E for ; Thu, 30 Dec 2021 00:03:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C450F385802E X-IronPort-AV: E=McAfee;i="6200,9189,10212"; a="239132781" X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="239132781" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Dec 2021 16:03:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="589406099" Received: from scymds02.sc.intel.com ([10.82.73.244]) by fmsmga004.fm.intel.com with ESMTP; 29 Dec 2021 16:03:58 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1BU03whQ009347; Wed, 29 Dec 2021 16:03:58 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v3 0/1] Add vector math function tan/tanf to libmvec Date: Wed, 29 Dec 2021 16:03:57 -0800 Message-Id: <20211230000358.3894697-1-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, NML_ADSP_CUSTOM_MED, SPF_HELO_NONE, SPF_SOFTFAIL, SPOOFED_FREEMAIL, SPOOF_GMAIL_MID, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com, marius.cornea@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch may looks big but 74% of this patch is data table. Changes from v2: - Replace big negative rip offset with Table Lookup Bias. - Remove more unused data table fields. - Include LOE(live on exit) register info. - Apply more peephole optimization. - Optimize load of all bits set into ZMM register - Replace 3 kmovw + andl with kandw instruction. - Restructure data table and remove unused fields. - Fix data table and field alignment according to ISA. - Fix data offset according to ISA. - Remove exit call dead code. - Remove unnecessary save/restore. - Keep cfi_escape for callee saved registers only. - Add DW_CFA_expression comments corresponding to each cfi_escape. - Define macro corresponding to each numeric data table offset. - Replace numeric data table offset with macro name. - Add data table structure definition as comments. - Restructure data table and add comments to each data field value. - Rename numeric sequential labels with meaningful label name. - Add more comments to labels as well as on call sites. - Internal special value processing paths replaced by calls to standard scalar math functions, makes code more compact and aligned with previous libmvec submission. Changes from v1: - Add ISA specific sections for all libmvec functions. - Add libmvec functions to math-vector-fortran.h. - Change label to sequential. - Fix function name in GNU header plate. This patch implements tan/tanf vector math functions containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests with regenerated ulps. Sunil K Pandey (1): x86-64: Add vector tan/tanf implementation to libmvec bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_tan2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_tan2_core.c | 27 + .../fpu/multiarch/svml_d_tan2_core_sse4.S | 6259 +++++++++++++++++ .../fpu/multiarch/svml_d_tan4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_tan4_core.c | 27 + .../fpu/multiarch/svml_d_tan4_core_avx2.S | 6227 ++++++++++++++++ .../fpu/multiarch/svml_d_tan8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_tan8_core.c | 27 + .../fpu/multiarch/svml_d_tan8_core_avx512.S | 2733 +++++++ .../fpu/multiarch/svml_s_tanf16_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_s_tanf16_core.c | 28 + .../fpu/multiarch/svml_s_tanf16_core_avx512.S | 927 +++ .../fpu/multiarch/svml_s_tanf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_tanf4_core.c | 28 + .../fpu/multiarch/svml_s_tanf4_core_sse4.S | 2600 +++++++ .../fpu/multiarch/svml_s_tanf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_tanf8_core.c | 28 + .../fpu/multiarch/svml_s_tanf8_core_avx2.S | 2595 +++++++ sysdeps/x86_64/fpu/svml_d_tan2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_tan4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_tan4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_tan8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_tanf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_tanf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_tanf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_tanf8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-tan-avx.c | 1 + .../x86_64/fpu/test-double-libmvec-tan-avx2.c | 1 + .../fpu/test-double-libmvec-tan-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-tan.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-tanf-avx.c | 1 + .../x86_64/fpu/test-float-libmvec-tanf-avx2.c | 1 + .../fpu/test-float-libmvec-tanf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-tanf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 50 files changed, 21913 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_tan2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_tan4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_tan4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_tan8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf.c