[v3,0/1] Add vector math function erfc/erfcf to libmvec

Message ID 20211230000004.3894570-1-skpgkp2@gmail.com
Headers
Series Add vector math function erfc/erfcf to libmvec |

Message

Sunil Pandey Dec. 30, 2021, midnight UTC
  This patch may looks big but 82% of this patch is data table.

Changes from v2:
-  Replace big negative rip offset with Table Lookup Bias.
-  Remove more unused data table fields.
-  Include LOE(live on exit) register info.
-  Apply more peephole optimization.
-  Optimize load of all bits set into ZMM register
-  Replace 3 kmovw + andl with kandw instruction.
-  Restructure data table and remove unused fields.
-  Fix data table and field alignment according to ISA.
-  Fix data offset according to ISA.
-  Remove exit call dead code.
-  Remove unnecessary save/restore.
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.
  
Changes from v1:
-  Add ISA specific sections for all libmvec functions.
-  Add libmvec functions to math-vector-fortran.h.
-  Change label to sequential.
-  Fix function name in GNU header plate.

This patch implements erfc/erfcf vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
It also contains accuracy and ABI tests with regenerated ulps.

Sunil K Pandey (1):
  x86-64: Add vector erfc/erfcf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_erfc2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc2_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc2_core_sse4.S    | 3853 ++++++++++++++++
 .../fpu/multiarch/svml_d_erfc4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc4_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc4_core_avx2.S    | 3857 ++++++++++++++++
 .../fpu/multiarch/svml_d_erfc8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_erfc8_core.c  |   27 +
 .../fpu/multiarch/svml_d_erfc8_core_avx512.S  | 3860 +++++++++++++++++
 .../fpu/multiarch/svml_s_erfcf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_erfcf16_core.c       |   28 +
 .../multiarch/svml_s_erfcf16_core_avx512.S    |  932 ++++
 .../fpu/multiarch/svml_s_erfcf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_erfcf4_core.c |   28 +
 .../fpu/multiarch/svml_s_erfcf4_core_sse4.S   |  939 ++++
 .../fpu/multiarch/svml_s_erfcf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_erfcf8_core.c |   28 +
 .../fpu/multiarch/svml_s_erfcf8_core_avx2.S   |  957 ++++
 sysdeps/x86_64/fpu/svml_d_erfc2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_erfc4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_erfc8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_s_erfcf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_erfcf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_erfcf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S   |   25 +
 .../x86_64/fpu/test-double-libmvec-erfc-avx.c |    1 +
 .../fpu/test-double-libmvec-erfc-avx2.c       |    1 +
 .../fpu/test-double-libmvec-erfc-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-erfc.c |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-erfcf-avx.c |    1 +
 .../fpu/test-float-libmvec-erfcf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-erfcf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 14970 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erfc8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erfcf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erfc8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erfcf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erfc.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erfcf.c