soft-fp: Add __extendhfxf2 and __truncxfhf2

Message ID 20210701154016.333268-1-hjl.tools@gmail.com
State Committed
Commit 8241409e29a347ff6613d28d13cb1c7cdf1ec888
Headers
Series soft-fp: Add __extendhfxf2 and __truncxfhf2 |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

H.J. Lu July 1, 2021, 3:40 p.m. UTC
  1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.

These are needed by x86 _Float16:

https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

support in GCC.
---
 soft-fp/extendhfxf2.c | 53 +++++++++++++++++++++++++++++++++++++++++++
 soft-fp/truncxfhf2.c  | 52 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)
 create mode 100644 soft-fp/extendhfxf2.c
 create mode 100644 soft-fp/truncxfhf2.c
  

Comments

Joseph Myers July 1, 2021, 5:56 p.m. UTC | #1
On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:

> 1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
> 2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.

OK.  Note that the second one of those should be corrected in the commit 
message to refer to __truncxfhf2 not __extendhfxf2.

> These are needed by x86 _Float16:
> 
> https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

(As I see it, supporting _Float16 on x86 is largely orthogonal to the 
existence of a new instruction set extension with full _Float16 arithmetic 
support.  The x86_64 ABI for _Float16 / _Complex _Float16 argument passing 
and return makes sense for any x86_64 processor regardless of whether the 
hardware supports full _Float16 arithmetic (this new extension), only 
conversions (F16C / Ivy Bridge) or no _Float16 operations at all (older 
processors).  IEEE binary32 is wide enough that converting from binary16 
to binary32, doing arithmetic and converting back produces correctly 
rounded results for any of the basic +-*/ operations and so would be a 
suitable fallback for implementing them, though configuring excess 
precision as done for older AArch64 processors might be more efficient - 
and when only conversions between binary16 and binary32 are available in 
hardware as in F16C, you need to use software truncation from binary64 to 
binary16 to avoid double rounding.)
  
H.J. Lu July 1, 2021, 6:06 p.m. UTC | #2
On Thu, Jul 1, 2021 at 10:56 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
>
> > 1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
> > 2. Add __extendhfxf2 to truncate IEEE extended into IEEE half.
>
> OK.  Note that the second one of those should be corrected in the commit
> message to refer to __truncxfhf2 not __extendhfxf2.

Thanks.  I pushed it with

1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.

> > These are needed by x86 _Float16:
> >
> > https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
>
> (As I see it, supporting _Float16 on x86 is largely orthogonal to the
> existence of a new instruction set extension with full _Float16 arithmetic
> support.  The x86_64 ABI for _Float16 / _Complex _Float16 argument passing
> and return makes sense for any x86_64 processor regardless of whether the
> hardware supports full _Float16 arithmetic (this new extension), only
> conversions (F16C / Ivy Bridge) or no _Float16 operations at all (older
> processors).  IEEE binary32 is wide enough that converting from binary16
> to binary32, doing arithmetic and converting back produces correctly
> rounded results for any of the basic +-*/ operations and so would be a
> suitable fallback for implementing them, though configuring excess
> precision as done for older AArch64 processors might be more efficient -
> and when only conversions between binary16 and binary32 are available in
> hardware as in F16C, you need to use software truncation from binary64 to
> binary16 to avoid double rounding.)
>

We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.
  
Joseph Myers July 1, 2021, 6:17 p.m. UTC | #3
On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:

> We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
> We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.

Sure.  The handling on powerpc64le of the cases where hardware binary128 
support might or might not be available could serve as an example of how 
to handle such cases in libgcc.

The x86_64 ABI describes argument passing / return for _Float16 / _Complex 
_Float16.  The i386 ABI doesn't; ABI support will need to be added if 
those types are to be supported at all on i386 (in the 32-bit case, it 
seems appropriate for the ABI to be something valid given the base 
architecture only, so argument passing on the stack like for other types, 
return in general-purpose registers or memory) - or else make sure GCC 
disallows _Float16 in the 32-bit case in the absence of defined ABI 
support.
  
H.J. Lu July 1, 2021, 9:13 p.m. UTC | #4
On Thu, Jul 1, 2021 at 11:17 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Libc-alpha wrote:
>
> > We can do _Float16 emulation like __float128 if AVX512FP16 isn't available.
> > We need to add AVX512FP16 variants for _Float16 builtin functions in libgcc.
>
> Sure.  The handling on powerpc64le of the cases where hardware binary128
> support might or might not be available could serve as an example of how
> to handle such cases in libgcc.

Hongtao, Hongyu, can you take a look at how powerpc64le handles binary128?

> The x86_64 ABI describes argument passing / return for _Float16 / _Complex
> _Float16.  The i386 ABI doesn't; ABI support will need to be added if
> those types are to be supported at all on i386 (in the 32-bit case, it

https://groups.google.com/g/ia32-abi/c/Qy_r-tY5iQY

> seems appropriate for the ABI to be something valid given the base
> architecture only, so argument passing on the stack like for other types,
> return in general-purpose registers or memory) - or else make sure GCC
> disallows _Float16 in the 32-bit case in the absence of defined ABI
> support.
>

On i386,

1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
  

Patch

diff --git a/soft-fp/extendhfxf2.c b/soft-fp/extendhfxf2.c
new file mode 100644
index 0000000000..1cb5fef947
--- /dev/null
+++ b/soft-fp/extendhfxf2.c
@@ -0,0 +1,53 @@ 
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE extended.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+XFtype
+__extendhfxf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_E (R);
+  XFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_EXTEND (E, H, 4, 1, R, A);
+#else
+  FP_EXTEND (E, H, 2, 1, R, A);
+#endif
+  FP_PACK_RAW_E (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/soft-fp/truncxfhf2.c b/soft-fp/truncxfhf2.c
new file mode 100644
index 0000000000..688ad24523
--- /dev/null
+++ b/soft-fp/truncxfhf2.c
@@ -0,0 +1,52 @@ 
+/* Software floating-point emulation.
+   Truncate IEEE extended into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+HFtype
+__truncxfhf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (H, E, 1, 4, R, A);
+#else
+  FP_TRUNC (H, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}