Message ID | 20211216001239.3648099-1-skpgkp2@gmail.com |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 331053858423 for <patchwork@sourceware.org>; Thu, 16 Dec 2021 00:13:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 331053858423 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1639613585; bh=ncE4sLofJup8TglH+zNsFxQntubB9t/DuuoJOJ2q5hk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=XN7Jtdkbe+kn06EoKmXj12AaVFw/qeR58KQUB9qTI6FFSgsogel0zt+x5opNAe2Iq N7kGUFJgL029BoEAPjK0zKymWepm/QwsxWwOSCAtAXn4jXXbQQJyycoGxXbc5pfcQa kDNOHNAGE/shUruelxPfFwHjEXulhWZrT5MdwmFg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id AC94B3858402 for <libc-alpha@sourceware.org>; Thu, 16 Dec 2021 00:12:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC94B3858402 X-IronPort-AV: E=McAfee;i="6200,9189,10199"; a="300141501" X-IronPort-AV: E=Sophos;i="5.88,209,1635231600"; d="scan'208";a="300141501" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2021 16:12:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,209,1635231600"; d="scan'208";a="568125819" Received: from scymds02.sc.intel.com ([10.82.73.244]) by fmsmga008.fm.intel.com with ESMTP; 15 Dec 2021 16:12:40 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1BG0Cd9e028572; Wed, 15 Dec 2021 16:12:39 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec Date: Wed, 15 Dec 2021 16:12:38 -0800 Message-Id: <20211216001239.3648099-1-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com> References: <CAFUsyfJ+sURnik75vJ5XXpS9FwZruoH4yNOECg6=eH3qH1uASg@mail.gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Sunil K Pandey via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Sunil K Pandey <skpgkp2@gmail.com> Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Add vector math function acos/acosf to libmvec
|
|
Message
Sunil Pandey
Dec. 16, 2021, 12:12 a.m. UTC
This is single function patch as suggested. We incorporated following changes in v4. Rest of the libmvec patches will follow similar change. Let me know if it looks reasonable? Changes from v3: - Remove exit call dead code. - Remove unnecessary save/restore. Changes from v2: - Keep cfi_escape for callee saved registers only. - Add DW_CFA_expression comments corresponding to each cfi_escape. - Define macro corresponding to each numeric data table offset. - Replace numeric data table offset with macro name. - Add data table structure definition as comments. - Restructure data table and add comments to each data field value. - Rename numeric sequential labels with meaningful label name. - Add more comments to labels as well as on call sites. - Internal special value processing paths replaced by calls to standard scalar math functions, makes code more compact and aligned with previous libmvec submission. Changes from v1: - Add ISA specific sections for all libmvec functions. - Add libmvec functions to math-vector-fortran.h. - Change label to sequential. - Fix function name in GNU header plate. This patch implements acos/acosf vector math functions containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests with regenerated ulps. Sunil K Pandey (1): x86-64: Add vector acos/acosf implementation to libmvec bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 4 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../multiarch/.svml_s_acosf4_core_sse4.S.swp | Bin 0 -> 16384 bytes .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 ++ .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ .../fpu/multiarch/svml_d_acos2_core_sse4.S | 369 ++++++++++++++++++ .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ .../fpu/multiarch/svml_d_acos4_core_avx2.S | 335 ++++++++++++++++ .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ .../fpu/multiarch/svml_d_acos8_core_avx512.S | 365 +++++++++++++++++ .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ .../multiarch/svml_s_acosf16_core_avx512.S | 311 +++++++++++++++ .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 321 +++++++++++++++ .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 299 ++++++++++++++ sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + .../fpu/test-double-libmvec-acos-avx2.c | 1 + .../fpu/test-double-libmvec-acos-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + .../fpu/test-float-libmvec-acosf-avx2.c | 1 + .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 52 files changed, 2613 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/.svml_s_acosf4_core_sse4.S.swp create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c
Comments
I'd like to know more about where this assembly code came from. Is it manually written? Generated by a compiler and then manually edited, as suggested in previous discussions? If it was generated by a compiler but then sufficiently edited that the assembly version is now the preferred form for subsequent modifications, it would still be helpful to see the original input to that compiler (e.g. as a comment in the .S files), in order to assess the relative merits of C and assembly sources for this code and the possibilities of adapting to other architectures - and also to see the original compiler output so we can judge the extent of the editing and work out what is in fact the preferred form for modifications.
On 16/12/2021 16:18, Joseph Myers wrote: > I'd like to know more about where this assembly code came from. Is it > manually written? Generated by a compiler and then manually edited, as > suggested in previous discussions? If it was generated by a compiler but > then sufficiently edited that the assembly version is now the preferred > form for subsequent modifications, it would still be helpful to see the > original input to that compiler (e.g. as a comment in the .S files), in > order to assess the relative merits of C and assembly sources for this > code and the possibilities of adapting to other architectures - and also > to see the original compiler output so we can judge the extent of the > editing and work out what is in fact the preferred form for modifications. > Besides it I am curious why it can't be coded in C with proper builtins plus some portability wrapper, as libraries like SLEEF does. With some care we can make generic code that can ported to other vector ABIs, and add optimizations only its is really requires (as FMA for the newer math functions).
On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote: > > > On 16/12/2021 16:18, Joseph Myers wrote: > > I'd like to know more about where this assembly code came from. Is it > > manually written? Generated by a compiler and then manually edited, as > > suggested in previous discussions? If it was generated by a compiler but > > then sufficiently edited that the assembly version is now the preferred > > form for subsequent modifications, it would still be helpful to see the > > original input to that compiler (e.g. as a comment in the .S files), in > > order to assess the relative merits of C and assembly sources for this > > code and the possibilities of adapting to other architectures - and also > > to see the original compiler output so we can judge the extent of the > > editing and work out what is in fact the preferred form for modifications. > > > > > Besides it I am curious why it can't be coded in C with proper builtins > plus some portability wrapper, as libraries like SLEEF does. With some > care we can make generic code that can ported to other vector ABIs, and > add optimizations only its is really requires (as FMA for the newer math > functions). libmvec functions are from Intel SVML library, which are x86-64 sepecific. They need to be compiled by a special version of Intel compiler for correctness, performance and accuracy. We do post-processing on Intel compiler generated codes for glibc contribution. H.J.
On 19/12/2021 17:34, H.J. Lu wrote: > On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote: >> >> >> On 16/12/2021 16:18, Joseph Myers wrote: >>> I'd like to know more about where this assembly code came from. Is it >>> manually written? Generated by a compiler and then manually edited, as >>> suggested in previous discussions? If it was generated by a compiler but >>> then sufficiently edited that the assembly version is now the preferred >>> form for subsequent modifications, it would still be helpful to see the >>> original input to that compiler (e.g. as a comment in the .S files), in >>> order to assess the relative merits of C and assembly sources for this >>> code and the possibilities of adapting to other architectures - and also >>> to see the original compiler output so we can judge the extent of the >>> editing and work out what is in fact the preferred form for modifications. >>> >> >> >> Besides it I am curious why it can't be coded in C with proper builtins >> plus some portability wrapper, as libraries like SLEEF does. With some >> care we can make generic code that can ported to other vector ABIs, and >> add optimizations only its is really requires (as FMA for the newer math >> functions). > > libmvec functions are from Intel SVML library, which are x86-64 sepecific. > They need to be compiled by a special version of Intel compiler for > correctness, performance and accuracy. We do post-processing on Intel > compiler generated codes for glibc contribution. From the weekly call, If I understood correctly these assembly routines are build from another high level language to C implementation and then the Intel compiler will generate the assembly which will post-processed in the current form. I am not sure if this is the best course of action, specially because we don't have access neither to the high-level code, nor to theused compiler to generate such assembly routines. I give that the resulting code might better than C code the tool dumps, but all these process are really hard to justify in a open-source tool like glibc. What I meant why not use something like SLEEF is that using the C as the base source for the high code generation might better to describe the whole process and generates more understandably and maintainable code that could be reused more easily to different architecture and vector ABIs. I give you that it might no generate the most optimized and fine-tuned code, but it also leverages compiler support and move to the current practice to remove arch-specific tuned implementations in favor of C generic ones. For instance, SLEEF code for vector cosf implementation [1], uses slim wrapper (vlt_vo_vf_vf for instance) that is reimplemented by each vector ABI [1]. It can be easily adapted to different vector ABIs without much work. [1] https://github.com/shibatch/sleef/blob/master/src/libm/sleefsimdsp.c [2] https://github.com/shibatch/sleef/blob/master/src/arch/helperavx512f.h
On Mon, Dec 20, 2021 at 11:10 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 19/12/2021 17:34, H.J. Lu wrote: > > On Thu, Dec 16, 2021 at 06:13:32PM -0300, GNU C Library wrote: > >> > >> > >> On 16/12/2021 16:18, Joseph Myers wrote: > >>> I'd like to know more about where this assembly code came from. Is it > >>> manually written? Generated by a compiler and then manually edited, as > >>> suggested in previous discussions? If it was generated by a compiler but > >>> then sufficiently edited that the assembly version is now the preferred > >>> form for subsequent modifications, it would still be helpful to see the > >>> original input to that compiler (e.g. as a comment in the .S files), in > >>> order to assess the relative merits of C and assembly sources for this > >>> code and the possibilities of adapting to other architectures - and also > >>> to see the original compiler output so we can judge the extent of the > >>> editing and work out what is in fact the preferred form for modifications. > >>> > >> > >> > >> Besides it I am curious why it can't be coded in C with proper builtins > >> plus some portability wrapper, as libraries like SLEEF does. With some > >> care we can make generic code that can ported to other vector ABIs, and > >> add optimizations only its is really requires (as FMA for the newer math > >> functions). > > > > libmvec functions are from Intel SVML library, which are x86-64 sepecific. > > They need to be compiled by a special version of Intel compiler for > > correctness, performance and accuracy. We do post-processing on Intel > > compiler generated codes for glibc contribution. > > From the weekly call, If I understood correctly these assembly routines > are build from another high level language to C implementation and then > the Intel compiler will generate the assembly which will post-processed in > the current form. > > I am not sure if this is the best course of action, specially because > we don't have access neither to the high-level code, nor to theused compiler > to generate such assembly routines. I give that the resulting code might > better than C code the tool dumps, but all these process are really hard to > justify in a open-source tool like glibc. Intel is committed to supporting x86 assembly codes in glibc. > What I meant why not use something like SLEEF is that using the C as > the base source for the high code generation might better to describe > the whole process and generates more understandably and maintainable > code that could be reused more easily to different architecture and vector > ABIs. > > I give you that it might no generate the most optimized and fine-tuned > code, but it also leverages compiler support and move to the current > practice to remove arch-specific tuned implementations in favor of > C generic ones. > > For instance, SLEEF code for vector cosf implementation [1], uses slim > wrapper (vlt_vo_vf_vf for instance) that is reimplemented by each vector > ABI [1]. It can be easily adapted to different vector ABIs without > much work. > > [1] https://github.com/shibatch/sleef/blob/master/src/libm/sleefsimdsp.c > [2] https://github.com/shibatch/sleef/blob/master/src/arch/helperavx512f.h The Intel SVML team compared SVML vs SLEEF a while back. SVML functions were ~1.5x faster on average than those from SLEEF. One major issue with SLEEF is that it lacks support for large arguments. For x86-64 libmvec in glibc 2.35, we want assembly codes from SVML. We can revisit it in the future when performance and accuracy of SLEEF have been improved. Thanks.
On Sun, 19 Dec 2021, H.J. Lu via Libc-alpha wrote: > libmvec functions are from Intel SVML library, which are x86-64 sepecific. > They need to be compiled by a special version of Intel compiler for > correctness, performance and accuracy. We do post-processing on Intel > compiler generated codes for glibc contribution. Please provide the input to that generation process (the original human-written code), the output and details of the post-processing done. We can't accept this code unless the assembly code is actually the preferred form for modification - so that it would make more sense to change the algorithm used by changing the assembly code rather than changing some step of the generation process, for example. To judge that we need to see the generation inputs and outputs.
Hello Joseph, I am sorry, but the original code was written in an internal Intel format used to generate multiple math libraries (not just SVML), and is not written in a human-readable format, unless one has extensive knowledge of relevant Intel processes. As such, it cannot be shared - it would not help at all. The situation was the same with our previous libmvec contribution, of 2016. Thank you, Marius Cornea -----Original Message----- From: Joseph Myers <joseph@codesourcery.com> Sent: Monday, December 20, 2021 1:41 PM To: H.J. Lu <hjl.tools@gmail.com> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>; libc-alpha@sourceware.org; Kolesov, Andrey <Andrey.Kolesov@intel.com>; Cornea, Marius <marius.cornea@intel.com> Subject: Re: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec On Sun, 19 Dec 2021, H.J. Lu via Libc-alpha wrote: > libmvec functions are from Intel SVML library, which are x86-64 sepecific. > They need to be compiled by a special version of Intel compiler for > correctness, performance and accuracy. We do post-processing on Intel > compiler generated codes for glibc contribution. Please provide the input to that generation process (the original human-written code), the output and details of the post-processing done. We can't accept this code unless the assembly code is actually the preferred form for modification - so that it would make more sense to change the algorithm used by changing the assembly code rather than changing some step of the generation process, for example. To judge that we need to see the generation inputs and outputs.
On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote: > I am sorry, but the original code was written in an internal Intel > format used to generate multiple math libraries (not just SVML), and is > not written in a human-readable format, unless one has extensive > knowledge of relevant Intel processes. As such, it cannot be shared - it > would not help at all. That doesn't convince me that the assembly code proposed now is source code (the preferred form for modifications). If you can't demonstrate that via showing the extent of manual modifications to the output is sufficient that the output has become the preferred form for further modifications, you'll need to find some other way to demonstrate that this contribution actually consists of source code and is free software at all. > The situation was the same with our previous libmvec contribution, of 2016. Well, if it turns out that's not actually the preferred form for modifications, we'll need to remove it from all glibc branches with that code on (and replace it with dumb wrappers round scalar functions to preserve the ABI, I suppose).
Hi Joseph, We put a lot of effort in making the changes you requested a few weeks ago, and we thought those were sufficient. Accepting this code now for GLIBC 2.35, updated according with your requests and at the level of the 2016 contributions, would benefit a lot is GCC/GLIBC users on all x86 CPUs. We can try to find a better solution in the future and we are willing to help, but that will take significantly more time. Maybe H.J. and Sunil can add to this. Thanks, Marius Cornea -----Original Message----- From: Joseph Myers <joseph@codesourcery.com> Sent: Monday, December 20, 2021 2:19 PM To: Cornea, Marius <marius.cornea@intel.com> Cc: H.J. Lu <hjl.tools@gmail.com>; Kolesov, Andrey <Andrey.Kolesov@intel.com>; libc-alpha@sourceware.org Subject: RE: [PATCH v4 0/1] Add vector math function acos/acosf to libmvec On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote: > I am sorry, but the original code was written in an internal Intel > format used to generate multiple math libraries (not just SVML), and > is not written in a human-readable format, unless one has extensive > knowledge of relevant Intel processes. As such, it cannot be shared - > it would not help at all. That doesn't convince me that the assembly code proposed now is source code (the preferred form for modifications). If you can't demonstrate that via showing the extent of manual modifications to the output is sufficient that the output has become the preferred form for further modifications, you'll need to find some other way to demonstrate that this contribution actually consists of source code and is free software at all. > The situation was the same with our previous libmvec contribution, of 2016. Well, if it turns out that's not actually the preferred form for modifications, we'll need to remove it from all glibc branches with that code on (and replace it with dumb wrappers round scalar functions to preserve the ABI, I suppose). -- Joseph S. Myers joseph@codesourcery.com
On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote: > We put a lot of effort in making the changes you requested a few weeks > ago, and we thought those were sufficient. A few weeks ago, there was no mention in the patch submission that this was generated code at all. So naturally I reviewed the code on the basis that it was source code as defined in LGPLv2.1. Likewise, I assumed for the original libmvec submission that it was a good-faith submission of free software source code. Now that the issue has been raised of the code coming out of a compiler, there is a much higher review bar to be met to demonstrate that it is nevertheless in the preferred form for modification - because the default assumption that has to be overcome is that the output of a compiler is *not* source code, and if it is not source code, it is not free software and it is not acceptable for glibc.
On Mon, Dec 20, 2021 at 4:57 PM Joseph Myers <joseph@codesourcery.com> wrote: > > On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote: > > > We put a lot of effort in making the changes you requested a few weeks > > ago, and we thought those were sufficient. > > A few weeks ago, there was no mention in the patch submission that this > was generated code at all. So naturally I reviewed the code on the basis > that it was source code as defined in LGPLv2.1. Likewise, I assumed for > the original libmvec submission that it was a good-faith submission of > free software source code. > > Now that the issue has been raised of the code coming out of a compiler, > there is a much higher review bar to be met to demonstrate that it is > nevertheless in the preferred form for modification - because the default > assumption that has to be overcome is that the output of a compiler is > *not* source code, and if it is not source code, it is not free software > and it is not acceptable for glibc. Think this code is "critical path" enough that it will be more frustrating trying to get C to compile to the "right" assembly than to just use the assembly directly. The only issue is if the patches are pushed half-done where it is still difficult to make changes to the source without going through the original source + intel compiler. > > -- > Joseph S. Myers > joseph@codesourcery.com
On Mon, Dec 20, 2021 at 3:12 PM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Mon, Dec 20, 2021 at 4:57 PM Joseph Myers <joseph@codesourcery.com> wrote: > > > > On Mon, 20 Dec 2021, Cornea, Marius via Libc-alpha wrote: > > > > > We put a lot of effort in making the changes you requested a few weeks > > > ago, and we thought those were sufficient. > > > > A few weeks ago, there was no mention in the patch submission that this > > was generated code at all. So naturally I reviewed the code on the basis > > that it was source code as defined in LGPLv2.1. Likewise, I assumed for > > the original libmvec submission that it was a good-faith submission of > > free software source code. > > > > Now that the issue has been raised of the code coming out of a compiler, > > there is a much higher review bar to be met to demonstrate that it is > > nevertheless in the preferred form for modification - because the default > > assumption that has to be overcome is that the output of a compiler is > > *not* source code, and if it is not source code, it is not free software > > and it is not acceptable for glibc. > > Think this code is "critical path" enough that it will be more > frustrating trying > to get C to compile to the "right" assembly than to just use the assembly > directly. True. We do manual peephole optimization on these compiler generated assembly codes. The final assembly codes are the preferred form for performance and accuracy. > The only issue is if the patches are pushed half-done where it is still > difficult to make changes to the source without going through the original > source + intel compiler. Agreed. But it can be done with care: commit 78c9ec9000f873abe7a15a91b87080a2e4308260 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Aug 20 06:42:24 2021 -0700 x86-64: Optimize load of all bits set into ZMM register [BZ #28252] Optimize loads of all bits set into ZMM register in AVX512 SVML codes by replacing vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX and vmovups .L_2il0floatpacket.13(%rip), %zmmX with vpternlogd $0xff, %zmmX, %zmmX, %zmmX This fixes BZ #28252.