[v4,0/1] Add vector math function acos/acosf to libmvec

Message ID	20211216001239.3648099-1-skpgkp2@gmail.com
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC94B3858402 To: libc-alpha@sourceware.org Subject: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec Date: Wed, 15 Dec 2021 16:12:38 -0800 Message-Id: <20211216001239.3648099-1-skpgkp2@gmail.com> In-Reply-To: <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com> References: <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Sunil K Pandey <skpgkp2@gmail.com> Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Add vector math function acos/acosf to libmvec \| [v4,0/1] Add vector math function acos/acosf to libmvec [v4,1/1] x86-64: Add vector acos/acosf implementation to libmvec

Message ID

20211216001239.3648099-1-skpgkp2@gmail.com

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC94B3858402
To: libc-alpha@sourceware.org
Subject: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec
Date: Wed, 15 Dec 2021 16:12:38 -0800
Message-Id: <20211216001239.3648099-1-skpgkp2@gmail.com>
In-Reply-To: 
 <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com>
References: 
 <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Sunil K Pandey <skpgkp2@gmail.com>
Cc: andrey.kolesov@intel.com
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Series

Add vector math function acos/acosf to libmvec |

Message

Sunil Pandey Dec. 16, 2021, 12:12 a.m. UTC

  This is single function patch as suggested.  We incorporated following
changes in v4.  Rest of the libmvec patches will follow similar change.
Let me know if it looks reasonable?

Changes from v3:
-  Remove exit call dead code.
-  Remove unnecessary save/restore.

Changes from v2:
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.
  
Changes from v1:
-  Add ISA specific sections for all libmvec functions.
-  Add libmvec functions to math-vector-fortran.h.
-  Change label to sequential.
-  Fix function name in GNU header plate.

This patch implements acos/acosf vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.
It also contains accuracy and ABI tests with regenerated ulps.

Sunil K Pandey (1):
  x86-64: Add vector acos/acosf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   4 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../multiarch/.svml_s_acosf4_core_sse4.S.swp  | Bin 0 -> 16384 bytes
 .../fpu/multiarch/ifunc-mathvec-avx512-skx.h  |  39 ++
 .../fpu/multiarch/svml_d_acos2_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_acos2_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos2_core_sse4.S    | 369 ++++++++++++++++++
 .../fpu/multiarch/svml_d_acos4_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_acos4_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos4_core_avx2.S    | 335 ++++++++++++++++
 .../fpu/multiarch/svml_d_acos8_core-avx2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_acos8_core.c  |  27 ++
 .../fpu/multiarch/svml_d_acos8_core_avx512.S  | 365 +++++++++++++++++
 .../fpu/multiarch/svml_s_acosf16_core-avx2.S  |  20 +
 .../fpu/multiarch/svml_s_acosf16_core.c       |  28 ++
 .../multiarch/svml_s_acosf16_core_avx512.S    | 311 +++++++++++++++
 .../fpu/multiarch/svml_s_acosf4_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_acosf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_acosf4_core_sse4.S   | 321 +++++++++++++++
 .../fpu/multiarch/svml_s_acosf8_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_acosf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_acosf8_core_avx2.S   | 299 ++++++++++++++
 sysdeps/x86_64/fpu/svml_d_acos2_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_acos4_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_acos8_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_acosf16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_acosf4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_acosf8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-acos-avx.c |   1 +
 .../fpu/test-double-libmvec-acos-avx2.c       |   1 +
 .../fpu/test-double-libmvec-acos-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-acos.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-acosf-avx.c |   1 +
 .../fpu/test-float-libmvec-acosf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-acosf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 52 files changed, 2613 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/.svml_s_acosf4_core_sse4.S.swp
 create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c

Comments

Joseph Myers Dec. 16, 2021, 7:18 p.m. UTC | #1

I'd like to know more about where this assembly code came from.  Is it 
manually written?  Generated by a compiler and then manually edited, as 
suggested in previous discussions?  If it was generated by a compiler but 
then sufficiently edited that the assembly version is now the preferred 
form for subsequent modifications, it would still be helpful to see the 
original input to that compiler (e.g. as a comment in the .S files), in 
order to assess the relative merits of C and assembly sources for this 
code and the possibilities of adapting to other architectures - and also 
to see the original compiler output so we can judge the extent of the 
editing and work out what is in fact the preferred form for modifications.

Adhemerval Zanella Netto Dec. 16, 2021, 9:13 p.m. UTC | #2

On 16/12/2021 16:18, Joseph Myers wrote:
> I'd like to know more about where this assembly code came from.  Is it 
> manually written?  Generated by a compiler and then manually edited, as 
> suggested in previous discussions?  If it was generated by a compiler but 
> then sufficiently edited that the assembly version is now the preferred 
> form for subsequent modifications, it would still be helpful to see the 
> original input to that compiler (e.g. as a comment in the .S files), in 
> order to assess the relative merits of C and assembly sources for this 
> code and the possibilities of adapting to other architectures - and also 
> to see the original compiler output so we can judge the extent of the 
> editing and work out what is in fact the preferred form for modifications.
> 


Besides it I am curious why it can't be coded in C with proper builtins
plus some portability wrapper, as libraries like SLEEF does.  With some 
care we can make generic code that can ported to other vector ABIs, and 
add optimizations only its is really requires (as FMA for the newer math
functions).

H.J. Lu Dec. 19, 2021, 8:34 p.m. UTC | #3

On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote:
> 
> 
> On 16/12/2021 16:18, Joseph Myers wrote:
> > I'd like to know more about where this assembly code came from.  Is it 
> > manually written?  Generated by a compiler and then manually edited, as 
> > suggested in previous discussions?  If it was generated by a compiler but 
> > then sufficiently edited that the assembly version is now the preferred 
> > form for subsequent modifications, it would still be helpful to see the 
> > original input to that compiler (e.g. as a comment in the .S files), in 
> > order to assess the relative merits of C and assembly sources for this 
> > code and the possibilities of adapting to other architectures - and also 
> > to see the original compiler output so we can judge the extent of the 
> > editing and work out what is in fact the preferred form for modifications.
> > 
> 
> 
> Besides it I am curious why it can't be coded in C with proper builtins
> plus some portability wrapper, as libraries like SLEEF does.  With some 
> care we can make generic code that can ported to other vector ABIs, and 
> add optimizations only its is really requires (as FMA for the newer math
> functions).

libmvec functions are from Intel SVML library, which are x86-64 sepecific.
They need to be compiled by a special version of Intel compiler for
correctness, performance and accuracy.  We do post-processing on Intel
compiler generated codes for glibc contribution.


H.J.

Adhemerval Zanella Netto Dec. 20, 2021, 7:10 p.m. UTC | #4

On 19/12/2021 17:34, H.J. Lu wrote:
> On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote:
>>
>>
>> On 16/12/2021 16:18, Joseph Myers wrote:
>>> I'd like to know more about where this assembly code came from.  Is it 
>>> manually written?  Generated by a compiler and then manually edited, as 
>>> suggested in previous discussions?  If it was generated by a compiler but 
>>> then sufficiently edited that the assembly version is now the preferred 
>>> form for subsequent modifications, it would still be helpful to see the 
>>> original input to that compiler (e.g. as a comment in the .S files), in 
>>> order to assess the relative merits of C and assembly sources for this 
>>> code and the possibilities of adapting to other architectures - and also 
>>> to see the original compiler output so we can judge the extent of the 
>>> editing and work out what is in fact the preferred form for modifications.
>>>
>>
>>
>> Besides it I am curious why it can't be coded in C with proper builtins
>> plus some portability wrapper, as libraries like SLEEF does.  With some 
>> care we can make generic code that can ported to other vector ABIs, and 
>> add optimizations only its is really requires (as FMA for the newer math
>> functions).
> 
> libmvec functions are from Intel SVML library, which are x86-64 sepecific.
> They need to be compiled by a special version of Intel compiler for
> correctness, performance and accuracy.  We do post-processing on Intel
> compiler generated codes for glibc contribution.

From the weekly call, If I understood correctly these assembly routines
are build from another high level language to C implementation and then
the Intel compiler will generate the assembly which will post-processed in
the current form.

I am not sure if this is the best course of action, specially because
we don't have access neither to the high-level code, nor to theused compiler
to generate such assembly routines.  I give that the resulting code might
better than C code the tool dumps, but all these process are really hard to
justify in a open-source tool like glibc.

What I meant why not use something like SLEEF is that using the C as
the base source for the high code generation might better to describe
the whole process and generates more understandably and maintainable
code that could be reused more easily to different architecture and vector
ABIs.

I give you that it might no generate the most optimized and fine-tuned
code, but it also leverages compiler support and move to the current
practice to remove arch-specific tuned implementations in favor of
C generic ones.

For instance, SLEEF code for vector cosf implementation [1], uses slim
wrapper (vlt_vo_vf_vf for instance) that is reimplemented by each vector
ABI [1].  It can be easily adapted to different vector ABIs without
much work.

[1] https://github.com/shibatch/sleef/blob/master/src/libm/sleefsimdsp.c
[2] https://github.com/shibatch/sleef/blob/master/src/arch/helperavx512f.h

H.J. Lu Dec. 20, 2021, 7:55 p.m. UTC | #5

On Mon, Dec 20, 2021 at 11:10 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 19/12/2021 17:34, H.J. Lu wrote:
> > On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote:
> >>
> >>
> >> On 16/12/2021 16:18, Joseph Myers wrote:
> >>> I'd like to know more about where this assembly code came from.  Is it
> >>> manually written?  Generated by a compiler and then manually edited, as
> >>> suggested in previous discussions?  If it was generated by a compiler but
> >>> then sufficiently edited that the assembly version is now the preferred
> >>> form for subsequent modifications, it would still be helpful to see the
> >>> original input to that compiler (e.g. as a comment in the .S files), in
> >>> order to assess the relative merits of C and assembly sources for this
> >>> code and the possibilities of adapting to other architectures - and also
> >>> to see the original compiler output so we can judge the extent of the
> >>> editing and work out what is in fact the preferred form for modifications.
> >>>
> >>
> >>
> >> Besides it I am curious why it can't be coded in C with proper builtins
> >> plus some portability wrapper, as libraries like SLEEF does.  With some
> >> care we can make generic code that can ported to other vector ABIs, and
> >> add optimizations only its is really requires (as FMA for the newer math
> >> functions).
> >
> > libmvec functions are from Intel SVML library, which are x86-64 sepecific.
> > They need to be compiled by a special version of Intel compiler for
> > correctness, performance and accuracy.  We do post-processing on Intel
> > compiler generated codes for glibc contribution.
>
> From the weekly call, If I understood correctly these assembly routines
> are build from another high level language to C implementation and then
> the Intel compiler will generate the assembly which will post-processed in
> the current form.
>
> I am not sure if this is the best course of action, specially because
> we don't have access neither to the high-level code, nor to theused compiler
> to generate such assembly routines.  I give that the resulting code might
> better than C code the tool dumps, but all these process are really hard to
> justify in a open-source tool like glibc.

Intel is committed to supporting x86 assembly codes in glibc.

> What I meant why not use something like SLEEF is that using the C as
> the base source for the high code generation might better to describe
> the whole process and generates more understandably and maintainable
> code that could be reused more easily to different architecture and vector
> ABIs.
>
> I give you that it might no generate the most optimized and fine-tuned
> code, but it also leverages compiler support and move to the current
> practice to remove arch-specific tuned implementations in favor of
> C generic ones.
>
> For instance, SLEEF code for vector cosf implementation [1], uses slim
> wrapper (vlt_vo_vf_vf for instance) that is reimplemented by each vector
> ABI [1].  It can be easily adapted to different vector ABIs without
> much work.
>
> [1] https://github.com/shibatch/sleef/blob/master/src/libm/sleefsimdsp.c
> [2] https://github.com/shibatch/sleef/blob/master/src/arch/helperavx512f.h

The Intel SVML team compared SVML vs SLEEF a while back.  SVML
functions were ~1.5x faster on average than those from SLEEF.  One major
issue with SLEEF is that it lacks support for large arguments.

For x86-64 libmvec in glibc 2.35, we want assembly codes from SVML.  We
can revisit it in the future when performance and accuracy of SLEEF have
been improved.

Thanks.

Joseph Myers Dec. 20, 2021, 9:41 p.m. UTC | #6

On Sun, 19 Dec 2021, H.J. Lu via Libc-alpha wrote:

> libmvec functions are from Intel SVML library, which are x86-64 sepecific.
> They need to be compiled by a special version of Intel compiler for
> correctness, performance and accuracy.  We do post-processing on Intel
> compiler generated codes for glibc contribution.

Please provide the input to that generation process (the original 
human-written code), the output and details of the post-processing done.

We can't accept this code unless the assembly code is actually the 
preferred form for modification - so that it would make more sense to 
change the algorithm used by changing the assembly code rather than 
changing some step of the generation process, for example.  To judge that 
we need to see the generation inputs and outputs.

develop--- via Libc-alpha Dec. 20, 2021, 10:07 p.m. UTC | #7

Hello Joseph,
I am sorry, but the original code was written in an internal Intel format used to generate multiple math libraries (not just SVML), and is not written in a human-readable format, unless one has extensive knowledge of relevant Intel processes. As such, it cannot be shared - it would not help at all. 
The situation was the same with our previous libmvec contribution, of 2016.
Thank you,
Marius Cornea

-----Original Message-----
From: Joseph Myers <joseph@codesourcery.com> 
Sent: Monday, December 20, 2021 1:41 PM
To: H.J. Lu <hjl.tools@gmail.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>; libc-alpha@sourceware.org; Kolesov, Andrey <Andrey.Kolesov@intel.com>; Cornea, Marius <marius.cornea@intel.com>
Subject: Re: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec

On Sun, 19 Dec 2021, H.J. Lu via Libc-alpha wrote:

> libmvec functions are from Intel SVML library, which are x86-64 sepecific.
> They need to be compiled by a special version of Intel compiler for 
> correctness, performance and accuracy.  We do post-processing on Intel 
> compiler generated codes for glibc contribution.

Please provide the input to that generation process (the original human-written code), the output and details of the post-processing done.

We can't accept this code unless the assembly code is actually the preferred form for modification - so that it would make more sense to change the algorithm used by changing the assembly code rather than changing some step of the generation process, for example.  To judge that we need to see the generation inputs and outputs.

Joseph Myers Dec. 20, 2021, 10:19 p.m. UTC | #8

On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote:

> I am sorry, but the original code was written in an internal Intel 
> format used to generate multiple math libraries (not just SVML), and is 
> not written in a human-readable format, unless one has extensive 
> knowledge of relevant Intel processes. As such, it cannot be shared - it 
> would not help at all.

That doesn't convince me that the assembly code proposed now is source 
code (the preferred form for modifications).  If you can't demonstrate 
that via showing the extent of manual modifications to the output is 
sufficient that the output has become the preferred form for further 
modifications, you'll need to find some other way to demonstrate that this 
contribution actually consists of source code and is free software at all.

> The situation was the same with our previous libmvec contribution, of 2016.

Well, if it turns out that's not actually the preferred form for 
modifications, we'll need to remove it from all glibc branches with that 
code on (and replace it with dumb wrappers round scalar functions to 
preserve the ABI, I suppose).

develop--- via Libc-alpha Dec. 20, 2021, 10:42 p.m. UTC | #9

Hi Joseph,
We put a lot of effort in making the changes you requested a few weeks ago, and we thought those were sufficient. 
Accepting this code now for GLIBC 2.35, updated according with your requests and at the level of the 2016 contributions, would benefit a lot is GCC/GLIBC users on all x86 CPUs.
We can try to find a better solution in the future and we are willing to help, but that will take significantly more time.
Maybe H.J. and Sunil can add to this.
Thanks,
Marius Cornea

-----Original Message-----
From: Joseph Myers <joseph@codesourcery.com> 
Sent: Monday, December 20, 2021 2:19 PM
To: Cornea, Marius <marius.cornea@intel.com>
Cc: H.J. Lu <hjl.tools@gmail.com>; Kolesov, Andrey <Andrey.Kolesov@intel.com>; libc-alpha@sourceware.org
Subject: RE: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec

On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote:

> I am sorry, but the original code was written in an internal Intel 
> format used to generate multiple math libraries (not just SVML), and 
> is not written in a human-readable format, unless one has extensive 
> knowledge of relevant Intel processes. As such, it cannot be shared - 
> it would not help at all.

That doesn't convince me that the assembly code proposed now is source code (the preferred form for modifications).  If you can't demonstrate that via showing the extent of manual modifications to the output is sufficient that the output has become the preferred form for further modifications, you'll need to find some other way to demonstrate that this contribution actually consists of source code and is free software at all.

> The situation was the same with our previous libmvec contribution, of 2016.

Well, if it turns out that's not actually the preferred form for modifications, we'll need to remove it from all glibc branches with that code on (and replace it with dumb wrappers round scalar functions to preserve the ABI, I suppose).

--
Joseph S. Myers
joseph@codesourcery.com

Joseph Myers Dec. 20, 2021, 10:57 p.m. UTC | #10

On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote:

> We put a lot of effort in making the changes you requested a few weeks 
> ago, and we thought those were sufficient.

A few weeks ago, there was no mention in the patch submission that this 
was generated code at all.  So naturally I reviewed the code on the basis 
that it was source code as defined in LGPLv2.1.  Likewise, I assumed for 
the original libmvec submission that it was a good-faith submission of 
free software source code.

Now that the issue has been raised of the code coming out of a compiler, 
there is a much higher review bar to be met to demonstrate that it is 
nevertheless in the preferred form for modification - because the default 
assumption that has to be overcome is that the output of a compiler is 
*not* source code, and if it is not source code, it is not free software 
and it is not acceptable for glibc.

Noah Goldstein Dec. 20, 2021, 11:11 p.m. UTC | #11

On Mon, Dec 20, 2021 at 4:57 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote:
>
> > We put a lot of effort in making the changes you requested a few weeks
> > ago, and we thought those were sufficient.
>
> A few weeks ago, there was no mention in the patch submission that this
> was generated code at all.  So naturally I reviewed the code on the basis
> that it was source code as defined in LGPLv2.1.  Likewise, I assumed for
> the original libmvec submission that it was a good-faith submission of
> free software source code.
>
> Now that the issue has been raised of the code coming out of a compiler,
> there is a much higher review bar to be met to demonstrate that it is
> nevertheless in the preferred form for modification - because the default
> assumption that has to be overcome is that the output of a compiler is
> *not* source code, and if it is not source code, it is not free software
> and it is not acceptable for glibc.

Think this code is "critical path" enough that it will be more
frustrating trying
to  get C to compile to the "right" assembly than to just use the assembly
directly.

The only issue is if the patches are pushed half-done where it is still
difficult to make changes to the source without going through the original
source + intel compiler.

>
> --
> Joseph S. Myers
> joseph@codesourcery.com

H.J. Lu Dec. 20, 2021, 11:58 p.m. UTC | #12

On Mon, Dec 20, 2021 at 3:12 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Mon, Dec 20, 2021 at 4:57 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote:
> >
> > > We put a lot of effort in making the changes you requested a few weeks
> > > ago, and we thought those were sufficient.
> >
> > A few weeks ago, there was no mention in the patch submission that this
> > was generated code at all.  So naturally I reviewed the code on the basis
> > that it was source code as defined in LGPLv2.1.  Likewise, I assumed for
> > the original libmvec submission that it was a good-faith submission of
> > free software source code.
> >
> > Now that the issue has been raised of the code coming out of a compiler,
> > there is a much higher review bar to be met to demonstrate that it is
> > nevertheless in the preferred form for modification - because the default
> > assumption that has to be overcome is that the output of a compiler is
> > *not* source code, and if it is not source code, it is not free software
> > and it is not acceptable for glibc.
>
> Think this code is "critical path" enough that it will be more
> frustrating trying
> to  get C to compile to the "right" assembly than to just use the assembly
> directly.

True.  We do manual peephole optimization on these compiler generated
assembly codes.  The final assembly codes are the preferred form for
performance and accuracy.

> The only issue is if the patches are pushed half-done where it is still
> difficult to make changes to the source without going through the original
> source + intel compiler.

Agreed.  But it can be done with care:

commit 78c9ec9000f873abe7a15a91b87080a2e4308260
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Aug 20 06:42:24 2021 -0700

    x86-64: Optimize load of all bits set into ZMM register [BZ #28252]

    Optimize loads of all bits set into ZMM register in AVX512 SVML codes
    by replacing

            vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX

    and

            vmovups   .L_2il0floatpacket.13(%rip), %zmmX

    with
            vpternlogd $0xff, %zmmX, %zmmX, %zmmX

    This fixes BZ #28252.