[0/5] IEEE 128-bit built-in overload support.

Message ID YuIUBVkLjqjYMZhp@toto.the-meissners.org
Headers
Series IEEE 128-bit built-in overload support. |

Message

Michael Meissner July 28, 2022, 4:43 a.m. UTC
  The following patches add support for doing built-in function overloading
between the two 128-bit IEEE types (i.e. _Float182/__float128 using KFmode and
when long double uses the IEEE 128-bit encoding with TFmode).

These patches lay the foundation for a set of follow-on patches that will
change the internal handling of 128-bit floating point types in GCC.  In the
future patches, I hope to change the compiler to always use KFmode for the
explicit _Float128/__float128 types, to always use TFmode for the long double
type, no matter which 128-bit floating point type is used, and IFmode for the
explicit __ibm128 type.

But before I can submit those patches to change the internal type structure, I
need to make sure that the built-in functions can handle both sets of types,
and the overload mechanism automatically switches between the two.

There are 5 patches in the series.

The first patch adds the infrastructure to the built-in mechanism to deal with
long doubles that use the IEEE 128-bit encoding.

The second patch adds overload support to the IEEE 128-bit round to odd
built-in functions.

The third patch adds overload support to the IEEE 128-bit comparason built-in
functions.

The fourth patch adds overload support to the IEEE 128-bit scalar extract field
and insert field built-in functions.

The fifth patch adds overload support to the IEEE 128-bit test data and test
data negate built-in functions.

I have tested these patches on a power10 that is running Fedora 36, which
defaults to using long doubles that are IEEE 128-bit.  I have built two
parallel GCC compilers, one that defaults to using IEEE 128-bit long doubles
and one that defaults to using IBM 128-bit long doubles.

I have compared the test results to the original compiler results, comparing a
modified GCC to the original compiler using an IEEE 128-bit long double
default, and also comparing a modified GCC to the original compiler using an
IBM 128-bit long double default.  In both cases, the results are the same.

I have also compared the compilers with the future patch in progress that does
switch the internal type handling.  Once those patches are installed, the
overload mechanism will insure the correct built-in is used.

Can I install these patches to the trunk?
  

Comments

Michael Meissner Aug. 3, 2022, 5:58 p.m. UTC | #1
Ping patches.

Patch #1 of 5.
| Date: Thu, 28 Jul 2022 00:47:13 -0400
| Subject: [PATCH 1/5] IEEE 128-bit built-in overload support.
| Message-ID: <YuIU0Yj4mu8LASSd@toto.the-meissners.org>

Patch #2 of 5.
| Date: Thu, 28 Jul 2022 00:48:51 -0400
| Subject: [PATCH 2/5] Support IEEE 128-bit overload round_to_odd built-in functions.
| Message-ID: <YuIVM+APJ5g/Yzcv@toto.the-meissners.org>

Patch #3 of 5.
| Date: Thu, 28 Jul 2022 00:50:43 -0400
| Subject: [PATCH 3/5] Support IEEE 128-bit overload comparison built-in functions.
| Message-ID: <YuIVo7MN5hmUzlOr@toto.the-meissners.org>

Patch #4 of 5.
| Date: Thu, 28 Jul 2022 00:52:38 -0400
| Subject: [PATCH 4/5] Support IEEE 128-bit overload extract and insert built-in functions.
| Message-ID: <YuIWFhnEXlfee42q@toto.the-meissners.org>

Patch #5 of 5.
| Date: Thu, 28 Jul 2022 00:54:15 -0400
| Subject: [PATCH 5/5] Support IEEE 128-bit overload test data built-in functions.
| Message-ID: <YuIWd+k7A3+lf6Hd@toto.the-meissners.org>
  
Segher Boessenkool Aug. 5, 2022, 6:19 p.m. UTC | #2
On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> These patches lay the foundation for a set of follow-on patches that will
> change the internal handling of 128-bit floating point types in GCC.  In the
> future patches, I hope to change the compiler to always use KFmode for the
> explicit _Float128/__float128 types, to always use TFmode for the long double
> type, no matter which 128-bit floating point type is used, and IFmode for the
> explicit __ibm128 type.

Making TFmode different from KFmode and IFmode is not an improvement.
NAK.


Segher
  
Michael Meissner Aug. 10, 2022, 6:23 a.m. UTC | #3
On Fri, Aug 05, 2022 at 01:19:05PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> > These patches lay the foundation for a set of follow-on patches that will
> > change the internal handling of 128-bit floating point types in GCC.  In the
> > future patches, I hope to change the compiler to always use KFmode for the
> > explicit _Float128/__float128 types, to always use TFmode for the long double
> > type, no matter which 128-bit floating point type is used, and IFmode for the
> > explicit __ibm128 type.
> 
> Making TFmode different from KFmode and IFmode is not an improvement.
> NAK.
> 
> 
> Segher

First of all, it already IS different from KFmode and IFmode, as we've talked
about.  I'm trying to clean this mess up.  Having explicit __float128's being
converted to TFmode if -mabi=ieeelongdouble is just as bad, and it means that
_Float128 and __float128 are not the same type.

What I'm trying to eliminate is the code in rs6000-builtin.cc that overrides
the builtin ops (i.e. it does the equivalent of an overloaded function):

  /* TODO: The following commentary and code is inherited from the original
     builtin processing code.  The commentary is a bit confusing, with the
     intent being that KFmode is always IEEE-128, IFmode is always IBM
     double-double, and TFmode is the current long double.  The code is
     confusing in that it converts from KFmode to TFmode pattern names,
     when the other direction is more intuitive.  Try to address this.  */

  /* We have two different modes (KFmode, TFmode) that are the IEEE
     128-bit floating point type, depending on whether long double is the
     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
     It is simpler if we only define one variant of the built-in function,
     and switch the code when defining it, rather than defining two built-
     ins and using the overload table in rs6000-c.cc to switch between the
     two.  If we don't have the proper assembler, don't do this switch
     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
  if (FLOAT128_IEEE_P (TFmode))
    switch (icode)
      {
      case CODE_FOR_sqrtkf2_odd:
	icode = CODE_FOR_sqrttf2_odd;
	break;
      case CODE_FOR_trunckfdf2_odd:
	icode = CODE_FOR_trunctfdf2_odd;
	break;
      case CODE_FOR_addkf3_odd:
	icode = CODE_FOR_addtf3_odd;
	break;
      case CODE_FOR_subkf3_odd:
	icode = CODE_FOR_subtf3_odd;
	break;
      case CODE_FOR_mulkf3_odd:
	icode = CODE_FOR_multf3_odd;
	break;
      case CODE_FOR_divkf3_odd:
	icode = CODE_FOR_divtf3_odd;
	break;
      case CODE_FOR_fmakf4_odd:
	icode = CODE_FOR_fmatf4_odd;
	break;
      case CODE_FOR_xsxexpqp_kf:
	icode = CODE_FOR_xsxexpqp_tf;
	break;
      case CODE_FOR_xsxsigqp_kf:
	icode = CODE_FOR_xsxsigqp_tf;
	break;
      case CODE_FOR_xststdcnegqp_kf:
	icode = CODE_FOR_xststdcnegqp_tf;
	break;
      case CODE_FOR_xsiexpqp_kf:
	icode = CODE_FOR_xsiexpqp_tf;
	break;
      case CODE_FOR_xsiexpqpf_kf:
	icode = CODE_FOR_xsiexpqpf_tf;
	break;
      case CODE_FOR_xststdcqp_kf:
	icode = CODE_FOR_xststdcqp_tf;
	break;
      case CODE_FOR_xscmpexpqp_eq_kf:
	icode = CODE_FOR_xscmpexpqp_eq_tf;
	break;
      case CODE_FOR_xscmpexpqp_lt_kf:
	icode = CODE_FOR_xscmpexpqp_lt_tf;
	break;
      case CODE_FOR_xscmpexpqp_gt_kf:
	icode = CODE_FOR_xscmpexpqp_gt_tf;
	break;
      case CODE_FOR_xscmpexpqp_unordered_kf:
	icode = CODE_FOR_xscmpexpqp_unordered_tf;
	break;
      default:
	break;
      }

    // ... other code

  if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
    {
      if (fcode == RS6000_BIF_PACK_IF)
	{
	  icode = CODE_FOR_packtf;
	  fcode = RS6000_BIF_PACK_TF;
	  uns_fcode = (size_t) fcode;
	}
      else if (fcode == RS6000_BIF_UNPACK_IF)
	{
	  icode = CODE_FOR_unpacktf;
	  fcode = RS6000_BIF_UNPACK_TF;
	  uns_fcode = (size_t) fcode;
	}
    }

In particular, without overloaded built-ins, we likely have something similar
to the above to cover all of the built-ins for both modes.  I tend to think
overloading is more natural in this case.
  
Segher Boessenkool Aug. 10, 2022, 5:03 p.m. UTC | #4
On Wed, Aug 10, 2022 at 02:23:27AM -0400, Michael Meissner wrote:
> On Fri, Aug 05, 2022 at 01:19:05PM -0500, Segher Boessenkool wrote:
> > On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> > > These patches lay the foundation for a set of follow-on patches that will
> > > change the internal handling of 128-bit floating point types in GCC.  In the
> > > future patches, I hope to change the compiler to always use KFmode for the
> > > explicit _Float128/__float128 types, to always use TFmode for the long double
> > > type, no matter which 128-bit floating point type is used, and IFmode for the
> > > explicit __ibm128 type.
> > 
> > Making TFmode different from KFmode and IFmode is not an improvement.
> > NAK.
> 
> First of all, it already IS different from KFmode and IFmode, as we've talked
> about.

It always is the same as either IFmode or KFmode in the end.  It is a
separate mode, yes, because generic code always wants to use TFmode.

> I'm trying to clean this mess up.  Having explicit __float128's being
> converted to TFmode if -mabi=ieeelongdouble is just as bad, and it means that
> _Float128 and __float128 are not the same type.

What do types have to do with this at all?

If TFmode means IEEE QP float, TFmode and KFmode can be used
interchangeably.  When TFmode means double-double, TFmode and IFmode can
be used interchangeably.  We should never depend on TFmode being
different from both underlying modes, that way madness lies.

If you remember, in 2016 or such I experimented with making TFmode a
macro-like thingie, so that we always get KFmode and IFmode in the
instruction stream.  This did not work because of the fundamental
problem that KFmode and IFmode cannot be ordered: for both modes there
are numbers it can represent that cannot be represented in the other
mode; converting from IFmode to KFmode is lossty for some numbers, and
the same is true for converting from KFmode to IFmode.  But, some
internals of GCC require all pairs of floating point modes (that can be
converted between at least) to be comparable (in the mathmatical sense).

Until that problem is solved, we CANNOT move forward.  Your 126/127/128
precision hack gave us some time, but nothing has been improved since
then, and things have started to fall apart at the seams again


Segher
  
Michael Meissner Aug. 11, 2022, 8:01 p.m. UTC | #5
On Wed, Aug 10, 2022 at 12:03:16PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 10, 2022 at 02:23:27AM -0400, Michael Meissner wrote:
> > On Fri, Aug 05, 2022 at 01:19:05PM -0500, Segher Boessenkool wrote:
> > > On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> > > > These patches lay the foundation for a set of follow-on patches that will
> > > > change the internal handling of 128-bit floating point types in GCC.  In the
> > > > future patches, I hope to change the compiler to always use KFmode for the
> > > > explicit _Float128/__float128 types, to always use TFmode for the long double
> > > > type, no matter which 128-bit floating point type is used, and IFmode for the
> > > > explicit __ibm128 type.
> > > 
> > > Making TFmode different from KFmode and IFmode is not an improvement.
> > > NAK.
> > 
> > First of all, it already IS different from KFmode and IFmode, as we've talked
> > about.
> 
> It always is the same as either IFmode or KFmode in the end.  It is a
> separate mode, yes, because generic code always wants to use TFmode.
> 
> > I'm trying to clean this mess up.  Having explicit __float128's being
> > converted to TFmode if -mabi=ieeelongdouble is just as bad, and it means that
> > _Float128 and __float128 are not the same type.
> 
> What do types have to do with this at all?

I believe the issue is with these two tests:

	gcc.dg/torture/float128-nan.c
	gcc.target/powerpc/nan128-1.c

In particular, both use nansq to create a signaling NaN.  The nansq function is
defined as nansf128 (i.e. it returns a _Float128 type).

However, at present, __float128 uses the long double type if the
-mabi=ieeelongdouble option is used, not the _Float128 type.  Even though these
both use TFmode in that scenario, the gimple code sees a _Float128 type that is
stored into a long double type.  The machine independent support sees that you
are changing types, and it silently converts the signaling NaN into a quiet
NaN.

An earlier patch was to just change nanq and nansq to resolve to nanl and nansl
in the case -mabi=ieeelongdouble, which you did not like.

In looking at it, I now believe that the type for _Float128 and __float128
should always be the same within the compiler.  Whether we would continue to
use the same type for long double and _Float128/__float128 remains to be seen.

But in doing the change, there are several places that need to be changed as
well.

> If TFmode means IEEE QP float, TFmode and KFmode can be used
> interchangeably.  When TFmode means double-double, TFmode and IFmode can
> be used interchangeably.  We should never depend on TFmode being
> different from both underlying modes, that way madness lies.

No, this is not supported without conversions being done (even if the
conversions are eventually nop conversions).  GCC firmly believes that there are
no modes that are equivalant and can be used interchangeably.

For example, in the float128-odd.c test case we have:

	__float128
	f128_fms (__float128 a, __float128 b, __float128 c)
	{
	  return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

by default if we just use the KFmode functions (because that is how they are
defined in the built-in tables) on a system where __float128 uses the long
double type and uses the TFmode, and remove the code in rs6000_expand_builtin
that changes the built-in (i.e. overloading by another name) right now the
compiler will trap because it calls copy_to_mode_reg if the predicate fails,
and the mode being copied is different from the operand.

The predicate fails because the type in the insn (i.e. KFmode) is not the same
as the type of the operand (i.e. TFmode), and the default predicate
(i.e. register_operand, altivec_register_operand, or vsx_register_operand)
checks the mode.

But that can be fixed by using convert_move's instead of copy_to_mode_reg, and
possibly with new predicates that support either TFmode or KFmode.

However, then GCC will insert convert's going from TFmode to KFmode.  Which
avoids the crash, but the converts mean that the combiner won't combine the
negate and __builtin_fmaf128_round_to_od and produce the single "xsmsubqpo"
instruction.  Instead it will generate a negate and then a "xsmaddqpo"
instruction.

I've played with adding new predicates that recognize either IEEE 128-bit type
and a separate one that recognizes either IBM 128-bit type.


This is why I proposed to have overload support so that the built-in functions
will automatically use a TFmode built-in or a KFmode built-in depending on what
the mode.
  
Joseph Myers Aug. 11, 2022, 8:44 p.m. UTC | #6
On Thu, 11 Aug 2022, Michael Meissner via Gcc-patches wrote:

> In looking at it, I now believe that the type for _Float128 and __float128
> should always be the same within the compiler.  Whether we would continue to
> use the same type for long double and _Float128/__float128 remains to be seen.

long double and _Float128 must always be different types; that's how it's 
defined in C23.
  
Jakub Jelinek Aug. 16, 2022, 6:07 p.m. UTC | #7
On Thu, Aug 11, 2022 at 08:44:17PM +0000, Joseph Myers wrote:
> On Thu, 11 Aug 2022, Michael Meissner via Gcc-patches wrote:
> 
> > In looking at it, I now believe that the type for _Float128 and __float128
> > should always be the same within the compiler.  Whether we would continue to
> > use the same type for long double and _Float128/__float128 remains to be seen.
> 
> long double and _Float128 must always be different types; that's how it's 
> defined in C23.

And when we implement C++23 P1467R9, if std::float128_t will be
_Float128 under the hood, then long double and _Float128 have to remain
distinct types and mangle differently, long double (and __float128 if
long double is IEEE quad and __float128 exists?) need to mangle the way
they currently do and _Float128 should mangle as  DF128_ .
             ::= DF <number> _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)
Wonder how shall we mangle the underlying type of std::bfloat16_t though.

I assume e.g. for libstdc++ implementation purposes we need to have
__ibm128 and __float128 types mangling as long double mangles when the
-mabi={ibm,ieee}longdouble option is used, because otherwise it would be
really hard to implement it.

	Jakub
  
Segher Boessenkool Aug. 16, 2022, 6:55 p.m. UTC | #8
Hi!

On Tue, Aug 16, 2022 at 08:07:48PM +0200, Jakub Jelinek wrote:
> On Thu, Aug 11, 2022 at 08:44:17PM +0000, Joseph Myers wrote:
> > On Thu, 11 Aug 2022, Michael Meissner via Gcc-patches wrote:
> > > In looking at it, I now believe that the type for _Float128 and __float128
> > > should always be the same within the compiler.  Whether we would continue to
> > > use the same type for long double and _Float128/__float128 remains to be seen.
> > 
> > long double and _Float128 must always be different types; that's how it's 
> > defined in C23.
> 
> And when we implement C++23 P1467R9, if std::float128_t will be
> _Float128 under the hood, then long double and _Float128 have to remain
> distinct types and mangle differently, long double (and __float128 if
> long double is IEEE quad and __float128 exists?) need to mangle the way
> they currently do and _Float128 should mangle as  DF128_ .
>              ::= DF <number> _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)

So should we make std::floatNN_t be the same as _FloatNN, and mangled
as DF<NN>_ ?  And __ieee128 (and long double implemented as that) the
same as we already have.

> Wonder how shall we mangle the underlying type of std::bfloat16_t though.

That should get some cross-platform mangling?  Power shouldn't go its
own way here :-)

> I assume e.g. for libstdc++ implementation purposes we need to have
> __ibm128 and __float128 types mangling as long double mangles when the
> -mabi={ibm,ieee}longdouble option is used, because otherwise it would be
> really hard to implement it.

If at all possible it should be the same as we have already: otherwise
it will be at least five years before anything works again (for users).

This agrees with what you propose afaics, but let's make this explicit?
It helps us sleep at night :-)


Segher