[v6,0/1] Add vector math function acos/acosf to libmvec

Message ID	20211221054033.636994-1-skpgkp2@gmail.com
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDE9A385841F To: libc-alpha@sourceware.org Subject: [PATCH v6 0/1] Add vector math function acos/acosf to libmvec Date: Mon, 20 Dec 2021 21:40:32 -0800 Message-Id: <20211221054033.636994-1-skpgkp2@gmail.com> In-Reply-To: <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com> References: <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Sunil K Pandey <skpgkp2@gmail.com> Cc: andrey.kolesov@intel.com, marius.cornea@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Add vector math function acos/acosf to libmvec \| [v6,0/1] Add vector math function acos/acosf to libmvec [v6,1/1] x86-64: Add vector acos/acosf implementation to libmvec

Message ID

20211221054033.636994-1-skpgkp2@gmail.com

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDE9A385841F
To: libc-alpha@sourceware.org
Subject: [PATCH v6 0/1] Add vector math function acos/acosf to libmvec
Date: Mon, 20 Dec 2021 21:40:32 -0800
Message-Id: <20211221054033.636994-1-skpgkp2@gmail.com>
In-Reply-To: 
 <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com>
References: 
 <CAFUsyfK8zDoTeyupRepNe-MFbfxfbntx5OXn=pnx3zVXvGpU-w@mail.gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Sunil K Pandey <skpgkp2@gmail.com>
Cc: andrey.kolesov@intel.com, marius.cornea@intel.com
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Series

Add vector math function acos/acosf to libmvec |

Message

Sunil Pandey Dec. 21, 2021, 5:40 a.m. UTC

  This is single function patch as suggested.  We incorporated following
changes in v6.  Rest of the libmvec patches will follow similar change.
Let me know if it looks reasonable?

Changes from v5:
-  Replace 3 kmovw + andl with kandw instruction.

Changes from v4:
-  Restructure data table and remove unused fields.
-  Fix data table and field alignment according to ISA.
-  Fix data offset according to ISA.

Changes from v3:
-  Remove exit call dead code.
-  Remove unnecessary save/restore.

Changes from v2:
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.
  
Changes from v1:
-  Add ISA specific sections for all libmvec functions.
-  Add libmvec functions to math-vector-fortran.h.
-  Change label to sequential.
-  Fix function name in GNU header plate.

This patch implements acos/acosf vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
It also contains accuracy and ABI tests with regenerated ulps.

Sunil K Pandey (1):
  x86-64: Add vector acos/acosf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   4 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
 .../fpu/multiarch/ifunc-mathvec-avx512-skx.h  |  39 +++
 .../fpu/multiarch/svml_d_acos2_core-sse2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_acos2_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos2_core_sse4.S    | 293 +++++++++++++++++
 .../fpu/multiarch/svml_d_acos4_core-sse.S     |  20 ++
 .../x86_64/fpu/multiarch/svml_d_acos4_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos4_core_avx2.S    | 273 ++++++++++++++++
 .../fpu/multiarch/svml_d_acos8_core-avx2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_acos8_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos8_core_avx512.S  | 295 ++++++++++++++++++
 .../fpu/multiarch/svml_s_acosf16_core-avx2.S  |  20 ++
 .../fpu/multiarch/svml_s_acosf16_core.c       |  28 ++
 .../multiarch/svml_s_acosf16_core_avx512.S    | 259 +++++++++++++++
 .../fpu/multiarch/svml_s_acosf4_core-sse2.S   |  20 ++
 .../x86_64/fpu/multiarch/svml_s_acosf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_acosf4_core_sse4.S   | 260 +++++++++++++++
 .../fpu/multiarch/svml_s_acosf8_core-sse.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_s_acosf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_acosf8_core_avx2.S   | 252 +++++++++++++++
 sysdeps/x86_64/fpu/svml_d_acos2_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_acos4_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_acos8_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_acosf16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_acosf4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_acosf8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-acos-avx.c |   1 +
 .../fpu/test-double-libmvec-acos-avx2.c       |   1 +
 .../fpu/test-double-libmvec-acos-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-acos.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-acosf-avx.c |   1 +
 .../fpu/test-float-libmvec-acosf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-acosf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 51 files changed, 2245 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c

Comments

Noah Goldstein Dec. 21, 2021, 6:44 a.m. UTC | #1

On Mon, Dec 20, 2021 at 11:40 PM Sunil K Pandey via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> This is single function patch as suggested.  We incorporated following
> changes in v6.  Rest of the libmvec patches will follow similar change.
> Let me know if it looks reasonable?

The one thing I would want in the others is the comment
next to register assignments with the line numbers that
make up the live range. Think it will make maintaining a great
deal simpler and should be easy enough to generate in
the compiler.

Other than that, I think this is fine for basing the rest on. You
might want to wait for Joseph's opinion, however.

>
> Changes from v5:
> -  Replace 3 kmovw + andl with kandw instruction.
>
> Changes from v4:
> -  Restructure data table and remove unused fields.
> -  Fix data table and field alignment according to ISA.
> -  Fix data offset according to ISA.
>
> Changes from v3:
> -  Remove exit call dead code.
> -  Remove unnecessary save/restore.
>
> Changes from v2:
> -  Keep cfi_escape for callee saved registers only.
> -  Add DW_CFA_expression comments corresponding to each cfi_escape.
> -  Define macro corresponding to each numeric data table offset.
> -  Replace numeric data table offset with macro name.
> -  Add data table structure definition as comments.
> -  Restructure data table and add comments to each data field value.
> -  Rename numeric sequential labels with meaningful label name.
> -  Add more comments to labels as well as on call sites.
> -  Internal special value processing paths replaced by calls to standard
>    scalar math functions, makes code more compact and aligned with
>    previous libmvec submission.
>
> Changes from v1:
> -  Add ISA specific sections for all libmvec functions.
> -  Add libmvec functions to math-vector-fortran.h.
> -  Change label to sequential.
> -  Fix function name in GNU header plate.
>
> This patch implements acos/acosf vector math functions containing
> SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
> It also contains accuracy and ABI tests with regenerated ulps.
>
> Sunil K Pandey (1):
>   x86-64: Add vector acos/acosf implementation to libmvec
>
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   4 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
>  .../fpu/multiarch/ifunc-mathvec-avx512-skx.h  |  39 +++
>  .../fpu/multiarch/svml_d_acos2_core-sse2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_acos2_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_acos2_core_sse4.S    | 293 +++++++++++++++++
>  .../fpu/multiarch/svml_d_acos4_core-sse.S     |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_acos4_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_acos4_core_avx2.S    | 273 ++++++++++++++++
>  .../fpu/multiarch/svml_d_acos8_core-avx2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_acos8_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_acos8_core_avx512.S  | 295 ++++++++++++++++++
>  .../fpu/multiarch/svml_s_acosf16_core-avx2.S  |  20 ++
>  .../fpu/multiarch/svml_s_acosf16_core.c       |  28 ++
>  .../multiarch/svml_s_acosf16_core_avx512.S    | 259 +++++++++++++++
>  .../fpu/multiarch/svml_s_acosf4_core-sse2.S   |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_acosf4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_acosf4_core_sse4.S   | 260 +++++++++++++++
>  .../fpu/multiarch/svml_s_acosf8_core-sse.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_acosf8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_acosf8_core_avx2.S   | 252 +++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_acos2_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_acos4_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S    |  25 ++
>  sysdeps/x86_64/fpu/svml_d_acos8_core.S        |  25 ++
>  sysdeps/x86_64/fpu/svml_s_acosf16_core.S      |  25 ++
>  sysdeps/x86_64/fpu/svml_s_acosf4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_acosf8_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S   |  25 ++
>  .../x86_64/fpu/test-double-libmvec-acos-avx.c |   1 +
>  .../fpu/test-double-libmvec-acos-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-acos-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-acos.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-acosf-avx.c |   1 +
>  .../fpu/test-float-libmvec-acosf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-acosf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-acosf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  51 files changed, 2245 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c
>
> --
> 2.31.1
>

H.J. Lu Dec. 22, 2021, 12:15 a.m. UTC | #2

On Mon, Dec 20, 2021 at 10:45 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Mon, Dec 20, 2021 at 11:40 PM Sunil K Pandey via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > This is single function patch as suggested.  We incorporated following
> > changes in v6.  Rest of the libmvec patches will follow similar change.
> > Let me know if it looks reasonable?
>
> The one thing I would want in the others is the comment
> next to register assignments with the line numbers that
> make up the live range. Think it will make maintaining a great
> deal simpler and should be easy enough to generate in
> the compiler.

Intel compiler can generate the following info at basic block
level in assembly codes:

# LOE rbx rbp r12 r13 r14 r15 eax
..B2.11: # Preds ..B2.10
# Execution count [1.00e+00]
ret
#9370.16

We will clean it up and add it after glibc 2.35.

> Other than that, I think this is fine for basing the rest on. You
> might want to wait for Joseph's opinion, however.
>
> >
> > Changes from v5:
> > -  Replace 3 kmovw + andl with kandw instruction.
> >
> > Changes from v4:
> > -  Restructure data table and remove unused fields.
> > -  Fix data table and field alignment according to ISA.
> > -  Fix data offset according to ISA.
> >
> > Changes from v3:
> > -  Remove exit call dead code.
> > -  Remove unnecessary save/restore.
> >
> > Changes from v2:
> > -  Keep cfi_escape for callee saved registers only.
> > -  Add DW_CFA_expression comments corresponding to each cfi_escape.
> > -  Define macro corresponding to each numeric data table offset.
> > -  Replace numeric data table offset with macro name.
> > -  Add data table structure definition as comments.
> > -  Restructure data table and add comments to each data field value.
> > -  Rename numeric sequential labels with meaningful label name.
> > -  Add more comments to labels as well as on call sites.
> > -  Internal special value processing paths replaced by calls to standard
> >    scalar math functions, makes code more compact and aligned with
> >    previous libmvec submission.
> >
> > Changes from v1:
> > -  Add ISA specific sections for all libmvec functions.
> > -  Add libmvec functions to math-vector-fortran.h.
> > -  Change label to sequential.
> > -  Fix function name in GNU header plate.
> >
> > This patch implements acos/acosf vector math functions containing
> > SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
> > It also contains accuracy and ABI tests with regenerated ulps.
> >
> > Sunil K Pandey (1):
> >   x86-64: Add vector acos/acosf implementation to libmvec
> >
> >  bits/libm-simd-decl-stubs.h                   |  11 +
> >  math/bits/mathcalls.h                         |   2 +-
> >  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
> >  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
> >  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
> >  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
> >  sysdeps/x86_64/fpu/Versions                   |   4 +
> >  sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
> >  .../fpu/multiarch/ifunc-mathvec-avx512-skx.h  |  39 +++
> >  .../fpu/multiarch/svml_d_acos2_core-sse2.S    |  20 ++
> >  .../x86_64/fpu/multiarch/svml_d_acos2_core.c  |  27 ++
> >  .../fpu/multiarch/svml_d_acos2_core_sse4.S    | 293 +++++++++++++++++
> >  .../fpu/multiarch/svml_d_acos4_core-sse.S     |  20 ++
> >  .../x86_64/fpu/multiarch/svml_d_acos4_core.c  |  27 ++
> >  .../fpu/multiarch/svml_d_acos4_core_avx2.S    | 273 ++++++++++++++++
> >  .../fpu/multiarch/svml_d_acos8_core-avx2.S    |  20 ++
> >  .../x86_64/fpu/multiarch/svml_d_acos8_core.c  |  27 ++
> >  .../fpu/multiarch/svml_d_acos8_core_avx512.S  | 295 ++++++++++++++++++
> >  .../fpu/multiarch/svml_s_acosf16_core-avx2.S  |  20 ++
> >  .../fpu/multiarch/svml_s_acosf16_core.c       |  28 ++
> >  .../multiarch/svml_s_acosf16_core_avx512.S    | 259 +++++++++++++++
> >  .../fpu/multiarch/svml_s_acosf4_core-sse2.S   |  20 ++
> >  .../x86_64/fpu/multiarch/svml_s_acosf4_core.c |  28 ++
> >  .../fpu/multiarch/svml_s_acosf4_core_sse4.S   | 260 +++++++++++++++
> >  .../fpu/multiarch/svml_s_acosf8_core-sse.S    |  20 ++
> >  .../x86_64/fpu/multiarch/svml_s_acosf8_core.c |  28 ++
> >  .../fpu/multiarch/svml_s_acosf8_core_avx2.S   | 252 +++++++++++++++
> >  sysdeps/x86_64/fpu/svml_d_acos2_core.S        |  29 ++
> >  sysdeps/x86_64/fpu/svml_d_acos4_core.S        |  29 ++
> >  sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S    |  25 ++
> >  sysdeps/x86_64/fpu/svml_d_acos8_core.S        |  25 ++
> >  sysdeps/x86_64/fpu/svml_s_acosf16_core.S      |  25 ++
> >  sysdeps/x86_64/fpu/svml_s_acosf4_core.S       |  29 ++
> >  sysdeps/x86_64/fpu/svml_s_acosf8_core.S       |  29 ++
> >  sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S   |  25 ++
> >  .../x86_64/fpu/test-double-libmvec-acos-avx.c |   1 +
> >  .../fpu/test-double-libmvec-acos-avx2.c       |   1 +
> >  .../fpu/test-double-libmvec-acos-avx512f.c    |   1 +
> >  sysdeps/x86_64/fpu/test-double-libmvec-acos.c |   3 +
> >  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
> >  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
> >  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
> >  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
> >  .../x86_64/fpu/test-float-libmvec-acosf-avx.c |   1 +
> >  .../fpu/test-float-libmvec-acosf-avx2.c       |   1 +
> >  .../fpu/test-float-libmvec-acosf-avx512f.c    |   1 +
> >  sysdeps/x86_64/fpu/test-float-libmvec-acosf.c |   3 +
> >  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
> >  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
> >  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
> >  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
> >  51 files changed, 2245 insertions(+), 1 deletion(-)
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c
> >
> > --
> > 2.31.1
> >