From patchwork Thu Dec 30 00:00:03 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sunil Pandey <skpgkp2@gmail.com>
X-Patchwork-Id: 49403
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 58FB8385802B
	for <patchwork@sourceware.org>; Thu, 30 Dec 2021 00:00:29 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58FB8385802B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1640822429;
	bh=hfa/LD8Wb4hU+cwDBtc81rs/Xo3wSSPr/XZzNd8nN+M=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=l4Qd5jIZa2exkZcKLDPUTothuKbGWKPw9AOoyGAjxo36ot/+EGyniodVMoK5IBQjF
	 cBAqGwnr40uik39doAhPMCTKh2tu7JUNSrYC66FfUbEXALOu49wcWZq7Ca6nOkWFoJ
	 SKkpU6/GidE5JnCWWCr5cxsHCcjk2jhgu5nEENBY=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by sourceware.org (Postfix) with ESMTPS id 742DA3858400
 for <libc-alpha@sourceware.org>; Thu, 30 Dec 2021 00:00:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 742DA3858400
X-IronPort-AV: E=McAfee;i="6200,9189,10212"; a="241390467"
X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="241390467"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Dec 2021 16:00:05 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.88,246,1635231600"; d="scan'208";a="468650695"
Received: from scymds02.sc.intel.com ([10.82.73.244])
 by orsmga003.jf.intel.com with ESMTP; 29 Dec 2021 16:00:05 -0800
Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211])
 by scymds02.sc.intel.com with ESMTP id 1BU004KX008284;
 Wed, 29 Dec 2021 16:00:04 -0800
To: libc-alpha@sourceware.org
Subject: [PATCH v3 0/1] Add vector math function erfc/erfcf to libmvec
Date: Wed, 29 Dec 2021 16:00:03 -0800
Message-Id: <20211230000004.3894570-1-skpgkp2@gmail.com>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,
 DKIM_ADSP_CUSTOM_MED,
 FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,
 HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS,
 NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_SOFTFAIL, SPOOFED_FREEMAIL, SPOOF_GMAIL_MID,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha
 <libc-alpha@sourceware.org>
From: Sunil Pandey <skpgkp2@gmail.com>
Reply-To: Sunil K Pandey <skpgkp2@gmail.com>
Cc: andrey.kolesov@intel.com, marius.cornea@intel.com
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

This patch may looks big but 82% of this patch is data table.

Changes from v2:
-  Replace big negative rip offset with Table Lookup Bias.
-  Remove more unused data table fields.
-  Include LOE(live on exit) register info.
-  Apply more peephole optimization.
-  Optimize load of all bits set into ZMM register
-  Replace 3 kmovw + andl with kandw instruction.
-  Restructure data table and remove unused fields.
-  Fix data table and field alignment according to ISA.
-  Fix data offset according to ISA.
-  Remove exit call dead code.
-  Remove unnecessary save/restore.
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.
  
Changes from v1:
-  Add ISA specific sections for all libmvec functions.
-  Add libmvec functions to math-vector-fortran.h.
-  Change label to sequential.
-  Fix function name in GNU header plate.

This patch implements erfc/erfcf vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
It also contains accuracy and ABI tests with regenerated ulps.

Sunil K Pandey (1):
  x86-64: Add vector erfc/erfcf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_erfc2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc2_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc2_core_sse4.S    | 3853 ++++++++++++++++
 .../fpu/multiarch/svml_d_erfc4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc4_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc4_core_avx2.S    | 3857 ++++++++++++++++
 .../fpu/multiarch/svml_d_erfc8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc8_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc8_core_avx512.S  | 3860 +++++++++++++++++
 .../fpu/multiarch/svml_s_erfcf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_erfcf16_core.c       |   28 +
 .../multiarch/svml_s_erfcf16_core_avx512.S    |  932 ++++
 .../fpu/multiarch/svml_s_erfcf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_erfcf4_core.c |   28 +
 .../fpu/multiarch/svml_s_erfcf4_core_sse4.S   |  939 ++++
 .../fpu/multiarch/svml_s_erfcf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_erfcf8_core.c |   28 +
 .../fpu/multiarch/svml_s_erfcf8_core_avx2.S   |  957 ++++
 sysdeps/x86_64/fpu/svml_d_erfc2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_erfc4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_erfc8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_s_erfcf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_erfcf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_erfcf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S   |   25 +
 .../x86_64/fpu/test-double-libmvec-erfc-avx.c |    1 +
 .../fpu/test-double-libmvec-erfc-avx2.c       |    1 +
 .../fpu/test-double-libmvec-erfc-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-erfc.c |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-erfcf-avx.c |    1 +
 .../fpu/test-float-libmvec-erfcf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-erfcf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 14970 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c