Message ID | 20211221054033.636994-1-skpgkp2@gmail.com |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B99043858018 for <patchwork@sourceware.org>; Tue, 21 Dec 2021 05:41:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B99043858018 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1640065272; bh=aJKF/xaSRya8Ihd3O+u9BnQeOTeXkyHZcJfEILR2g/8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=FkDNQu16t9LbblsDJ06xNc/CHlR5GCge3+5MyLCLwktBywc0YCLvC02SGKJ6mta9l 8Kt78+XInvFsu7fuRFs9q5/i53e+kYCMTkOdYYk8I4o91b1hNj1VzTGN7l3IEuS7TL w2X0Ox46WglS8cjRUvQaVwogwADac22L+LyoM6WA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by sourceware.org (Postfix) with ESMTPS id DDE9A385841F for <libc-alpha@sourceware.org>; Tue, 21 Dec 2021 05:40:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDE9A385841F X-IronPort-AV: E=McAfee;i="6200,9189,10204"; a="227622242" X-IronPort-AV: E=Sophos;i="5.88,222,1635231600"; d="scan'208";a="227622242" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2021 21:40:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,222,1635231600"; d="scan'208";a="521626562" Received: from scymds02.sc.intel.com ([10.82.73.244]) by fmsmga007.fm.intel.com with ESMTP; 20 Dec 2021 21:40:33 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1BL5eXwC009579; Mon, 20 Dec 2021 21:40:33 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v6 0/1] Add vector math function acos/acosf to libmvec Date: Mon, 20 Dec 2021 21:40:32 -0800 Message-Id: <20211221054033.636994-1-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com> References: <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Sunil K Pandey <skpgkp2@gmail.com> Cc: andrey.kolesov@intel.com, marius.cornea@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Add vector math function acos/acosf to libmvec
|
|
Message
Sunil Pandey
Dec. 21, 2021, 5:40 a.m. UTC
This is single function patch as suggested. We incorporated following changes in v6. Rest of the libmvec patches will follow similar change. Let me know if it looks reasonable? Changes from v5: - Replace 3 kmovw + andl with kandw instruction. Changes from v4: - Restructure data table and remove unused fields. - Fix data table and field alignment according to ISA. - Fix data offset according to ISA. Changes from v3: - Remove exit call dead code. - Remove unnecessary save/restore. Changes from v2: - Keep cfi_escape for callee saved registers only. - Add DW_CFA_expression comments corresponding to each cfi_escape. - Define macro corresponding to each numeric data table offset. - Replace numeric data table offset with macro name. - Add data table structure definition as comments. - Restructure data table and add comments to each data field value. - Rename numeric sequential labels with meaningful label name. - Add more comments to labels as well as on call sites. - Internal special value processing paths replaced by calls to standard scalar math functions, makes code more compact and aligned with previous libmvec submission. Changes from v1: - Add ISA specific sections for all libmvec functions. - Add libmvec functions to math-vector-fortran.h. - Change label to sequential. - Fix function name in GNU header plate. This patch implements acos/acosf vector math functions containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests with regenerated ulps. Sunil K Pandey (1): x86-64: Add vector acos/acosf implementation to libmvec bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 4 + sysdeps/x86_64/fpu/libm-test-ulps | 20 ++ .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 +++ .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ .../fpu/multiarch/svml_d_acos2_core_sse4.S | 293 +++++++++++++++++ .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ .../fpu/multiarch/svml_d_acos4_core_avx2.S | 273 ++++++++++++++++ .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ .../fpu/multiarch/svml_d_acos8_core_avx512.S | 295 ++++++++++++++++++ .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 ++ .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ .../multiarch/svml_s_acosf16_core_avx512.S | 259 +++++++++++++++ .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 260 +++++++++++++++ .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 252 +++++++++++++++ sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + .../fpu/test-double-libmvec-acos-avx2.c | 1 + .../fpu/test-double-libmvec-acos-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + .../fpu/test-float-libmvec-acosf-avx2.c | 1 + .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 51 files changed, 2245 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c
Comments
On Mon, Dec 20, 2021 at 11:40 PM Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org> wrote: > > This is single function patch as suggested. We incorporated following > changes in v6. Rest of the libmvec patches will follow similar change. > Let me know if it looks reasonable? The one thing I would want in the others is the comment next to register assignments with the line numbers that make up the live range. Think it will make maintaining a great deal simpler and should be easy enough to generate in the compiler. Other than that, I think this is fine for basing the rest on. You might want to wait for Joseph's opinion, however. > > Changes from v5: > - Replace 3 kmovw + andl with kandw instruction. > > Changes from v4: > - Restructure data table and remove unused fields. > - Fix data table and field alignment according to ISA. > - Fix data offset according to ISA. > > Changes from v3: > - Remove exit call dead code. > - Remove unnecessary save/restore. > > Changes from v2: > - Keep cfi_escape for callee saved registers only. > - Add DW_CFA_expression comments corresponding to each cfi_escape. > - Define macro corresponding to each numeric data table offset. > - Replace numeric data table offset with macro name. > - Add data table structure definition as comments. > - Restructure data table and add comments to each data field value. > - Rename numeric sequential labels with meaningful label name. > - Add more comments to labels as well as on call sites. > - Internal special value processing paths replaced by calls to standard > scalar math functions, makes code more compact and aligned with > previous libmvec submission. > > Changes from v1: > - Add ISA specific sections for all libmvec functions. > - Add libmvec functions to math-vector-fortran.h. > - Change label to sequential. > - Fix function name in GNU header plate. > > This patch implements acos/acosf vector math functions containing > SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. > It also contains accuracy and ABI tests with regenerated ulps. > > Sunil K Pandey (1): > x86-64: Add vector acos/acosf implementation to libmvec > > bits/libm-simd-decl-stubs.h | 11 + > math/bits/mathcalls.h | 2 +- > .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + > sysdeps/x86/fpu/bits/math-vector.h | 4 + > .../x86/fpu/finclude/math-vector-fortran.h | 4 + > sysdeps/x86_64/fpu/Makeconfig | 1 + > sysdeps/x86_64/fpu/Versions | 4 + > sysdeps/x86_64/fpu/libm-test-ulps | 20 ++ > .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 +++ > .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 ++ > .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ > .../fpu/multiarch/svml_d_acos2_core_sse4.S | 293 +++++++++++++++++ > .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 ++ > .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ > .../fpu/multiarch/svml_d_acos4_core_avx2.S | 273 ++++++++++++++++ > .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 ++ > .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ > .../fpu/multiarch/svml_d_acos8_core_avx512.S | 295 ++++++++++++++++++ > .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 ++ > .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ > .../multiarch/svml_s_acosf16_core_avx512.S | 259 +++++++++++++++ > .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 ++ > .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ > .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 260 +++++++++++++++ > .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 ++ > .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ > .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 252 +++++++++++++++ > sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ > sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ > sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ > sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ > sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ > sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ > sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ > sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ > .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + > .../fpu/test-double-libmvec-acos-avx2.c | 1 + > .../fpu/test-double-libmvec-acos-avx512f.c | 1 + > sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + > .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + > .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + > .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + > .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + > .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + > .../fpu/test-float-libmvec-acosf-avx2.c | 1 + > .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + > sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + > .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + > .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + > .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + > .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + > 51 files changed, 2245 insertions(+), 1 deletion(-) > create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c > > -- > 2.31.1 >
On Mon, Dec 20, 2021 at 10:45 PM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Mon, Dec 20, 2021 at 11:40 PM Sunil K Pandey via Libc-alpha > <libc-alpha@sourceware.org> wrote: > > > > This is single function patch as suggested. We incorporated following > > changes in v6. Rest of the libmvec patches will follow similar change. > > Let me know if it looks reasonable? > > The one thing I would want in the others is the comment > next to register assignments with the line numbers that > make up the live range. Think it will make maintaining a great > deal simpler and should be easy enough to generate in > the compiler. Intel compiler can generate the following info at basic block level in assembly codes: # LOE rbx rbp r12 r13 r14 r15 eax ..B2.11: # Preds ..B2.10 # Execution count [1.00e+00] ret #9370.16 We will clean it up and add it after glibc 2.35. > Other than that, I think this is fine for basing the rest on. You > might want to wait for Joseph's opinion, however. > > > > > Changes from v5: > > - Replace 3 kmovw + andl with kandw instruction. > > > > Changes from v4: > > - Restructure data table and remove unused fields. > > - Fix data table and field alignment according to ISA. > > - Fix data offset according to ISA. > > > > Changes from v3: > > - Remove exit call dead code. > > - Remove unnecessary save/restore. > > > > Changes from v2: > > - Keep cfi_escape for callee saved registers only. > > - Add DW_CFA_expression comments corresponding to each cfi_escape. > > - Define macro corresponding to each numeric data table offset. > > - Replace numeric data table offset with macro name. > > - Add data table structure definition as comments. > > - Restructure data table and add comments to each data field value. > > - Rename numeric sequential labels with meaningful label name. > > - Add more comments to labels as well as on call sites. > > - Internal special value processing paths replaced by calls to standard > > scalar math functions, makes code more compact and aligned with > > previous libmvec submission. > > > > Changes from v1: > > - Add ISA specific sections for all libmvec functions. > > - Add libmvec functions to math-vector-fortran.h. > > - Change label to sequential. > > - Fix function name in GNU header plate. > > > > This patch implements acos/acosf vector math functions containing > > SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. > > It also contains accuracy and ABI tests with regenerated ulps. > > > > Sunil K Pandey (1): > > x86-64: Add vector acos/acosf implementation to libmvec > > > > bits/libm-simd-decl-stubs.h | 11 + > > math/bits/mathcalls.h | 2 +- > > .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + > > sysdeps/x86/fpu/bits/math-vector.h | 4 + > > .../x86/fpu/finclude/math-vector-fortran.h | 4 + > > sysdeps/x86_64/fpu/Makeconfig | 1 + > > sysdeps/x86_64/fpu/Versions | 4 + > > sysdeps/x86_64/fpu/libm-test-ulps | 20 ++ > > .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 +++ > > .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 ++ > > .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ > > .../fpu/multiarch/svml_d_acos2_core_sse4.S | 293 +++++++++++++++++ > > .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 ++ > > .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ > > .../fpu/multiarch/svml_d_acos4_core_avx2.S | 273 ++++++++++++++++ > > .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 ++ > > .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ > > .../fpu/multiarch/svml_d_acos8_core_avx512.S | 295 ++++++++++++++++++ > > .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 ++ > > .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ > > .../multiarch/svml_s_acosf16_core_avx512.S | 259 +++++++++++++++ > > .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 ++ > > .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ > > .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 260 +++++++++++++++ > > .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 ++ > > .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ > > .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 252 +++++++++++++++ > > sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ > > sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ > > sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ > > sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ > > sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ > > sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ > > sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ > > sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ > > .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + > > .../fpu/test-double-libmvec-acos-avx2.c | 1 + > > .../fpu/test-double-libmvec-acos-avx512f.c | 1 + > > sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + > > .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + > > .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + > > .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + > > .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + > > .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + > > .../fpu/test-float-libmvec-acosf-avx2.c | 1 + > > .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + > > sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + > > .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + > > .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + > > .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + > > .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + > > 51 files changed, 2245 insertions(+), 1 deletion(-) > > create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S > > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S > > create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S > > create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S > > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c > > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c > > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c > > create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c > > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c > > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c > > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c > > create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c > > > > -- > > 2.31.1 > >