soft-fp: Add __extendhfxf2 and __truncxfhf2
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
dj/TryBot-32bit |
success
|
Build for i686
|
Commit Message
1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.
These are needed by x86 _Float16:
https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
support in GCC.
---
soft-fp/extendhfxf2.c | 53 +++++++++++++++++++++++++++++++++++++++++++
soft-fp/truncxfhf2.c | 52 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 105 insertions(+)
create mode 100644 soft-fp/extendhfxf2.c
create mode 100644 soft-fp/truncxfhf2.c
Comments
On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
> 1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
> 2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.
OK. Note that the second one of those should be corrected in the commit
message to refer to __truncxfhf2 not __extendhfxf2.
> These are needed by x86 _Float16:
>
> https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
(As I see it, supporting _Float16 on x86 is largely orthogonal to the
existence of a new instruction set extension with full _Float16 arithmetic
support. The x86_64 ABI for _Float16 / _Complex _Float16 argument passing
and return makes sense for any x86_64 processor regardless of whether the
hardware supports full _Float16 arithmetic (this new extension), only
conversions (F16C / Ivy Bridge) or no _Float16 operations at all (older
processors). IEEE binary32 is wide enough that converting from binary16
to binary32, doing arithmetic and converting back produces correctly
rounded results for any of the basic +-*/ operations and so would be a
suitable fallback for implementing them, though configuring excess
precision as done for older AArch64 processors might be more efficient -
and when only conversions between binary16 and binary32 are available in
hardware as in F16C, you need to use software truncation from binary64 to
binary16 to avoid double rounding.)
On Thu, Jul 1, 2021 at 10:56 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
>
> > 1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
> > 2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.
>
> OK. Note that the second one of those should be corrected in the commit
> message to refer to __truncxfhf2 not __extendhfxf2.
Thanks. I pushed it with
1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.
> > These are needed by x86 _Float16:
> >
> > https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
>
> (As I see it, supporting _Float16 on x86 is largely orthogonal to the
> existence of a new instruction set extension with full _Float16 arithmetic
> support. The x86_64 ABI for _Float16 / _Complex _Float16 argument passing
> and return makes sense for any x86_64 processor regardless of whether the
> hardware supports full _Float16 arithmetic (this new extension), only
> conversions (F16C / Ivy Bridge) or no _Float16 operations at all (older
> processors). IEEE binary32 is wide enough that converting from binary16
> to binary32, doing arithmetic and converting back produces correctly
> rounded results for any of the basic +-*/ operations and so would be a
> suitable fallback for implementing them, though configuring excess
> precision as done for older AArch64 processors might be more efficient -
> and when only conversions between binary16 and binary32 are available in
> hardware as in F16C, you need to use software truncation from binary64 to
> binary16 to avoid double rounding.)
>
We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.
On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
> We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
> We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.
Sure. The handling on powerpc64le of the cases where hardware binary128
support might or might not be available could serve as an example of how
to handle such cases in libgcc.
The x86_64 ABI describes argument passing / return for _Float16 / _Complex
_Float16. The i386 ABI doesn't; ABI support will need to be added if
those types are to be supported at all on i386 (in the 32-bit case, it
seems appropriate for the ABI to be something valid given the base
architecture only, so argument passing on the stack like for other types,
return in general-purpose registers or memory) - or else make sure GCC
disallows _Float16 in the 32-bit case in the absence of defined ABI
support.
On Thu, Jul 1, 2021 at 11:17 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
>
> > We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
> > We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.
>
> Sure. The handling on powerpc64le of the cases where hardware binary128
> support might or might not be available could serve as an example of how
> to handle such cases in libgcc.
Hongtao, Hongyu, can you take a look at how powerpc64le handles binary128?
> The x86_64 ABI describes argument passing / return for _Float16 / _Complex
> _Float16. The i386 ABI doesn't; ABI support will need to be added if
> those types are to be supported at all on i386 (in the 32-bit case, it
https://groups.google.com/g/ia32-abi/c/Qy_r-tY5iQY
> seems appropriate for the ABI to be something valid given the base
> architecture only, so argument passing on the stack like for other types,
> return in general-purpose registers or memory) - or else make sure GCC
> disallows _Float16 in the 32-bit case in the absence of defined ABI
> support.
>
On i386,
1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
new file mode 100644
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+ Return an IEEE half converted to IEEE extended.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+XFtype
+__extendhfxf2 (HFtype a)
+{
+ FP_DECL_EX;
+ FP_DECL_H (A);
+ FP_DECL_E (R);
+ XFtype r;
+
+ FP_INIT_EXCEPTIONS;
+ FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < 64
+ FP_EXTEND (E, H, 4, 1, R, A);
+#else
+ FP_EXTEND (E, H, 2, 1, R, A);
+#endif
+ FP_PACK_RAW_E (r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}
new file mode 100644
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+ Truncate IEEE extended into IEEE half.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ In addition to the permissions in the GNU Lesser General Public
+ License, the Free Software Foundation gives you unlimited
+ permission to link the compiled version of this file into
+ combinations with other programs, and to distribute those
+ combinations without any restriction coming from the use of this
+ file. (The Lesser General Public License restrictions do apply in
+ other respects; for example, they cover modification of the file,
+ and distribution when not linked into a combine executable.)
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+HFtype
+__truncxfhf2 (XFtype a)
+{
+ FP_DECL_EX;
+ FP_DECL_E (A);
+ FP_DECL_H (R);
+ HFtype r;
+
+ FP_INIT_ROUNDMODE;
+ FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+ FP_TRUNC (H, E, 1, 4, R, A);
+#else
+ FP_TRUNC (H, E, 1, 2, R, A);
+#endif
+ FP_PACK_SEMIRAW_H (r, R);
+ FP_HANDLE_EXCEPTIONS;
+
+ return r;
+}