From patchwork Thu Dec 30 00:03:57 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sunil Pandey <skpgkp2@gmail.com>
X-Patchwork-Id: 49405
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A856E3858038
	for <patchwork@sourceware.org>; Thu, 30 Dec 2021 00:04:31 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A856E3858038
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1640822671;
	bh=M0iIMk9CqZ15ZM49yDlfe4ht/u/t+KaG7Z3hQKU76tY=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=QXhw/NwdRw8tspk6vMlqSbjxLBsAgmi3dUJbLQnQKmbbIZ2BollDw+gIH4Oaifso5
	 9M8NZB9czfhzkntHNH19KQpqqya/8ZJ44ptBhNtYgCRzV1mCJ6w1S/WbGqH267kuFV
	 hOe0pFU+VR/72E9P3lsEA/Wn3CXrXx5RFDoVQWCE=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by sourceware.org (Postfix) with ESMTPS id C450F385802E
 for <libc-alpha@sourceware.org>; Thu, 30 Dec 2021 00:03:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C450F385802E
X-IronPort-AV: E=McAfee;i="6200,9189,10212"; a="239132781"
X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="239132781"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Dec 2021 16:03:59 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="589406099"
Received: from scymds02.sc.intel.com ([10.82.73.244])
 by fmsmga004.fm.intel.com with ESMTP; 29 Dec 2021 16:03:58 -0800
Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211])
 by scymds02.sc.intel.com with ESMTP id 1BU03whQ009347;
 Wed, 29 Dec 2021 16:03:58 -0800
To: libc-alpha@sourceware.org
Subject: [PATCH v3 0/1] Add vector math function tan/tanf to libmvec
Date: Wed, 29 Dec 2021 16:03:57 -0800
Message-Id: <20211230000358.3894697-1-skpgkp2@gmail.com>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,
 DKIM_ADSP_CUSTOM_MED,
 FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,
 HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS,
 NML_ADSP_CUSTOM_MED, SPF_HELO_NONE, SPF_SOFTFAIL, SPOOFED_FREEMAIL,
 SPOOF_GMAIL_MID, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha
 <libc-alpha@sourceware.org>
From: Sunil Pandey <skpgkp2@gmail.com>
Reply-To: Sunil K Pandey <skpgkp2@gmail.com>
Cc: andrey.kolesov@intel.com, marius.cornea@intel.com
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

This patch may looks big but 74% of this patch is data table.

Changes from v2:
-  Replace big negative rip offset with Table Lookup Bias.
-  Remove more unused data table fields.
-  Include LOE(live on exit) register info.
-  Apply more peephole optimization.
-  Optimize load of all bits set into ZMM register
-  Replace 3 kmovw + andl with kandw instruction.
-  Restructure data table and remove unused fields.
-  Fix data table and field alignment according to ISA.
-  Fix data offset according to ISA.
-  Remove exit call dead code.
-  Remove unnecessary save/restore.
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.
  
Changes from v1:
-  Add ISA specific sections for all libmvec functions.
-  Add libmvec functions to math-vector-fortran.h.
-  Change label to sequential.
-  Fix function name in GNU header plate.

This patch implements tan/tanf vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
It also contains accuracy and ABI tests with regenerated ulps.

Sunil K Pandey (1):
  x86-64: Add vector tan/tanf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_tan2_core-sse2.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_tan2_core.c   |   27 +
 .../fpu/multiarch/svml_d_tan2_core_sse4.S     | 6259 +++++++++++++++++
 .../fpu/multiarch/svml_d_tan4_core-sse.S      |   20 +
 .../x86_64/fpu/multiarch/svml_d_tan4_core.c   |   27 +
 .../fpu/multiarch/svml_d_tan4_core_avx2.S     | 6227 ++++++++++++++++
 .../fpu/multiarch/svml_d_tan8_core-avx2.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_tan8_core.c   |   27 +
 .../fpu/multiarch/svml_d_tan8_core_avx512.S   | 2733 +++++++
 .../fpu/multiarch/svml_s_tanf16_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanf16_core.c |   28 +
 .../fpu/multiarch/svml_s_tanf16_core_avx512.S |  927 +++
 .../fpu/multiarch/svml_s_tanf4_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanf4_core.c  |   28 +
 .../fpu/multiarch/svml_s_tanf4_core_sse4.S    | 2600 +++++++
 .../fpu/multiarch/svml_s_tanf8_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanf8_core.c  |   28 +
 .../fpu/multiarch/svml_s_tanf8_core_avx2.S    | 2595 +++++++
 sysdeps/x86_64/fpu/svml_d_tan2_core.S         |   29 +
 sysdeps/x86_64/fpu/svml_d_tan4_core.S         |   29 +
 sysdeps/x86_64/fpu/svml_d_tan4_core_avx.S     |   25 +
 sysdeps/x86_64/fpu/svml_d_tan8_core.S         |   25 +
 sysdeps/x86_64/fpu/svml_s_tanf16_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_tanf4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_s_tanf8_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_s_tanf8_core_avx.S    |   25 +
 .../x86_64/fpu/test-double-libmvec-tan-avx.c  |    1 +
 .../x86_64/fpu/test-double-libmvec-tan-avx2.c |    1 +
 .../fpu/test-double-libmvec-tan-avx512f.c     |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-tan.c  |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-tanf-avx.c  |    1 +
 .../x86_64/fpu/test-float-libmvec-tanf-avx2.c |    1 +
 .../fpu/test-float-libmvec-tanf-avx512f.c     |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-tanf.c  |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 21913 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tan8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tan2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tan4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tan4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tan8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tan.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanf.c