[2/3] Make __float128 use the _Float128 type, PR target/107299

Message ID Y2HZFlHH8HuvhGL4@toto.the-meissners.org
State New
Headers
Series [1/3] Rework 128-bit complex multiply and divide, PR target/107299 |

Commit Message

Michael Meissner Nov. 2, 2022, 2:42 a.m. UTC
  This patch fixes the issue that GCC cannot build when the default long double
is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
during the evrp pass.  Ultimately it is failing because the code declared the
type to use TFmode but it used F128 functions (i.e. KFmode).

	typedef float TFtype __attribute__((mode (TF)));
	typedef __complex float TCtype __attribute__((mode (TC)));

	TCtype
	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
	{
	  TFtype ac, bd, ad, bc, x, y;
	  TCtype res;

	  ac = a * c;
	  bd = b * d;
	  ad = a * d;
	  bc = b * c;

	  x = ac - bd;
	  y = ad + bc;

	  if (__builtin_isnan (x) && __builtin_isnan (y))
	    {
	      _Bool recalc = 0;
	      if (__builtin_isinf (a) || __builtin_isinf (b))
		{

		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (__builtin_isinf (c) || __builtin_isinf (d))
		{

		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  recalc = 1;
		}
	      if (!recalc
		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
		{

		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (recalc)
		{
		  x = __builtin_inff128 () * (a * c - b * d);
		  y = __builtin_inff128 () * (a * d + b * c);
		}
	    }

	  __real__ res = x;
	  __imag__ res = y;
	  return res;
	}

Currently GCC uses the long double type node for __float128 if long double is
IEEE 128-bit.  It did not use the node for _Float128.

Originally this was noticed if you call the nansq function to make a signaling
NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
different from __float128, the machine independent code converts signaling NaNs
to quiet NaNs if the types are not compatible.  The following tests used to
fail when run on a system where long double is IEEE 128-bit:

	gcc.dg/torture/float128-nan.c
	gcc.target/powerpc/nan128-1.c

This patch makes both __float128 and _Float128 use the same type node.

One side effect of not using the long double type node for __float128 is that we
must only use KFmode for _Float128/__float128.  The libstdc++ library won't
build if we use TFmode for _Float128 and __float128 when long double is IEEE
128-bit.

Another minor side effect is that the f128 round to odd fused multiply-add
function will not merge negatition with the FMA operation when the type is long
double.  If the type is __float128 or _Float128, then it will continue to do the
optimization.  The round to odd functions are defined in terms of __float128
arguments.  For example:

	long double
	do_fms (long double a, long double b, long double c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate (assuming -mabi=ieeelongdouble):

	xsnegqp 4,4
	xsmaddqpo 4,2,3
	xxlor 34,36,36

while:

	__float128
	do_fms (__float128 a, __float128 b, __float128 c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate:

	xsmsubqpo 4,2,3
	xxlor 34,36,36

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-11-01   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
	_Float128 type for __float128.
	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
	built-in passes or returns TFmode.  If the predicate failed because the
	modes were different, use convert_move to load up the value instead of
	copy_to_mode_reg.
	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
	modes (KFmode or KCmode), even if long double is IEEE 128-bit.
	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
	time if we support IEEE 128-bit floating point.
	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/float128-hw12.c: New test.
	* gcc.target/powerpc/float128-hw13.c: Likewise.
	* gcc.target/powerpc/float128-hw4.c: Update insns.
---
 gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
 gcc/config/rs6000/rs6000.cc                   |  31 ++-
 .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw4.c         |  10 +-
 5 files changed, 431 insertions(+), 121 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c
  

Comments

Michael Meissner Nov. 7, 2022, 3:43 p.m. UTC | #1
Ping patch:

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>

This patch is needed to build GCC on Fedora 36 which has switched the long
double default to be IEEE 128-bit.
  
Michael Meissner Nov. 29, 2022, 5:44 p.m. UTC | #2
Can we get the three patches in this patch set reviewed?  Without the patches
applied, GCC 13 will not build on Fedora 37, which uses long double defaulting
to IEEE 128-bit.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>
  
Michael Meissner Dec. 2, 2022, 6:01 p.m. UTC | #3
Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is for the 2nd of 3 patches.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604836.html
  
Kewen.Lin Dec. 6, 2022, 11:27 a.m. UTC | #4
Hi Mike,

Thanks for fixing this, some comments are inlined below.

on 2022/11/2 10:42, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).
> 
> 	typedef float TFtype __attribute__((mode (TF)));
> 	typedef __complex float TCtype __attribute__((mode (TC)));
> 
> 	TCtype
> 	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
> 	{
> 	  TFtype ac, bd, ad, bc, x, y;
> 	  TCtype res;
> 
> 	  ac = a * c;
> 	  bd = b * d;
> 	  ad = a * d;
> 	  bc = b * c;
> 
> 	  x = ac - bd;
> 	  y = ad + bc;
> 
> 	  if (__builtin_isnan (x) && __builtin_isnan (y))
> 	    {
> 	      _Bool recalc = 0;
> 	      if (__builtin_isinf (a) || __builtin_isinf (b))
> 		{
> 
> 		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
> 		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
> 		  if (__builtin_isnan (c))
> 		    c = __builtin_copysignf128 (0, c);
> 		  if (__builtin_isnan (d))
> 		    d = __builtin_copysignf128 (0, d);
> 		  recalc = 1;
> 		}
> 	      if (__builtin_isinf (c) || __builtin_isinf (d))
> 		{
> 
> 		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
> 		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
> 		  if (__builtin_isnan (a))
> 		    a = __builtin_copysignf128 (0, a);
> 		  if (__builtin_isnan (b))
> 		    b = __builtin_copysignf128 (0, b);
> 		  recalc = 1;
> 		}
> 	      if (!recalc
> 		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
> 		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
> 		{
> 
> 		  if (__builtin_isnan (a))
> 		    a = __builtin_copysignf128 (0, a);
> 		  if (__builtin_isnan (b))
> 		    b = __builtin_copysignf128 (0, b);
> 		  if (__builtin_isnan (c))
> 		    c = __builtin_copysignf128 (0, c);
> 		  if (__builtin_isnan (d))
> 		    d = __builtin_copysignf128 (0, d);
> 		  recalc = 1;
> 		}
> 	      if (recalc)
> 		{
> 		  x = __builtin_inff128 () * (a * c - b * d);
> 		  y = __builtin_inff128 () * (a * d + b * c);
> 		}
> 	    }
> 
> 	  __real__ res = x;
> 	  __imag__ res = y;
> 	  return res;
> 	}
> 

One further reduced test case can be:

typedef float TFtype __attribute__((mode (TF)));

TFtype test (TFtype t)
{
  return __builtin_copysignf128 (1.0q, t);
}

Since this reduced test case is quite small, maybe it's good to make it as one
test case associated with this patch.

> Currently GCC uses the long double type node for __float128 if long double is
> IEEE 128-bit.  It did not use the node for _Float128.
> 
> Originally this was noticed if you call the nansq function to make a signaling
> NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
> different from __float128, the machine independent code converts signaling NaNs
> to quiet NaNs if the types are not compatible.  The following tests used to
> fail when run on a system where long double is IEEE 128-bit:
> 
> 	gcc.dg/torture/float128-nan.c
> 	gcc.target/powerpc/nan128-1.c
> 
> This patch makes both __float128 and _Float128 use the same type node.
> 
> One side effect of not using the long double type node for __float128 is that we
> must only use KFmode for _Float128/__float128.  The libstdc++ library won't
> build if we use TFmode for _Float128 and __float128 when long double is IEEE
> 128-bit.
> 

Sorry that I didn't get the point of the latter sentence, this proposed patch
uses KFmode for _Float128 and __float128, do you mean that would be fine for
libstdc++ library building since we don't use TFmode for them?

> Another minor side effect is that the f128 round to odd fused multiply-add
> function will not merge negatition with the FMA operation when the type is long
> double.  If the type is __float128 or _Float128, then it will continue to do the
> optimization.  The round to odd functions are defined in terms of __float128
> arguments.  For example:
> 
> 	long double
> 	do_fms (long double a, long double b, long double c)
> 	{
> 	    return __builtin_fmaf128_round_to_odd (a, b, -c);
> 	}
> 
> will generate (assuming -mabi=ieeelongdouble):
> 
> 	xsnegqp 4,4
> 	xsmaddqpo 4,2,3
> 	xxlor 34,36,36
> 
> while:
> 
> 	__float128
> 	do_fms (__float128 a, __float128 b, __float128 c)
> 	{
> 	    return __builtin_fmaf128_round_to_odd (a, b, -c);
> 	}
> 
> will generate:
> 
> 	xsmsubqpo 4,2,3
> 	xxlor 34,36,36
> 

It sounds like a tiny "regression", maybe we should open a PR to track it?

> I tested all 3 patchs for PR target/107299 on:
> 
>     1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
>     2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
>     3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
>     4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm
> 
> Once all 3 patches have been applied, we can once again build GCC when long
> double is IEEE 128-bit.  There were no other regressions with these patches.
> Can I check these patches into the trunk?
> 
> 2022-11-01   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	PR target/107299
> 	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
> 	_Float128 type for __float128.
> 	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
> 	built-in passes or returns TFmode.  If the predicate failed because the
> 	modes were different, use convert_move to load up the value instead of
> 	copy_to_mode_reg.
> 	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
> 	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
> 	modes (KFmode or KCmode), even if long double is IEEE 128-bit.

You meant to say "Translate .." not "Don't .."?

> 	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
> 	time if we support IEEE 128-bit floating point.
> 	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.
> 
> gcc/testsuite/
> 
> 	PR target/107299
> 	* gcc.target/powerpc/float128-hw12.c: New test.
> 	* gcc.target/powerpc/float128-hw13.c: Likewise.
> 	* gcc.target/powerpc/float128-hw4.c: Update insns.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
>  gcc/config/rs6000/rs6000.cc                   |  31 ++-
>  .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
>  .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
>  .../gcc.target/powerpc/float128-hw4.c         |  10 +-
>  5 files changed, 431 insertions(+), 121 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
> index 90ab39dc258..e5298f45363 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -730,25 +730,28 @@ rs6000_init_builtins (void)
>  
>    if (TARGET_FLOAT128_TYPE)
>      {
> -      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
> -	ieee128_float_type_node = long_double_type_node;
> -      else
> +      /* In the past we used long_double_type_node when long double was IEEE
> +	 128-bit.  However, this means that the _Float128 type
> +	 (i.e. float128_type_node) is a different type from __float128
> +	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
                                  ~~~~ node
> +	 such as processing signaling NaNs with the nansf128 built-in function
> +	 (which returns a _Float128 value) and assign it to a long double or
> +	 __float128 value.  The two explicit IEEE 128-bit types should always
> +	 use the same internal type.
> +
> +	 For C we only need to register the __ieee128 name for it.  For C++, we
> +	 create a distinct type which will mangle differently (u9__ieee128)
> +	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
> +      if (float128t_type_node == NULL_TREE)
>  	{
> -	  /* For C we only need to register the __ieee128 name for
> -	     it.  For C++, we create a distinct type which will mangle
> -	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
> -	     backwards compatibly.  */
> -	  if (float128t_type_node == NULL_TREE)
> -	    {
> -	      float128t_type_node = make_node (REAL_TYPE);
> -	      TYPE_PRECISION (float128t_type_node)
> -		= TYPE_PRECISION (float128_type_node);
> -	      layout_type (float128t_type_node);
> -	      SET_TYPE_MODE (float128t_type_node,
> -			     TYPE_MODE (float128_type_node));
> -	    }
> -	  ieee128_float_type_node = float128t_type_node;
> +	  float128t_type_node = make_node (REAL_TYPE);
> +	  TYPE_PRECISION (float128t_type_node)
> +	    = TYPE_PRECISION (float128_type_node);
> +	  layout_type (float128t_type_node);
> +	  SET_TYPE_MODE (float128t_type_node,
> +			 TYPE_MODE (float128_type_node));
>  	}
> +      ieee128_float_type_node = float128t_type_node;
>        t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
>        lang_hooks.types.register_builtin_type (ieee128_float_type_node,
>  					      "__ieee128");
> @@ -3265,13 +3268,13 @@ htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
>  
>  /* Expand an expression EXP that calls a built-in function,
>     with result going to TARGET if that's convenient
> -   (and in mode MODE if that's convenient).
> +   (and in mode RETURN_MODE if that's convenient).
>     SUBTARGET may be used as the target for computing one of EXP's operands.
>     IGNORE is nonzero if the value is to be ignored.
>     Use the new builtin infrastructure.  */
>  rtx
>  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
> -		       machine_mode /* mode */, int ignore)
> +		       machine_mode return_mode, int ignore)
>  {
>    tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
>    enum rs6000_gen_builtins fcode
> @@ -3287,78 +3290,99 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
>    size_t uns_fcode = (size_t)fcode;
>    enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
>  
> -  /* TODO: The following commentary and code is inherited from the original
> -     builtin processing code.  The commentary is a bit confusing, with the
> -     intent being that KFmode is always IEEE-128, IFmode is always IBM
> -     double-double, and TFmode is the current long double.  The code is
> -     confusing in that it converts from KFmode to TFmode pattern names,
> -     when the other direction is more intuitive.  Try to address this.  */
> -
> -  /* We have two different modes (KFmode, TFmode) that are the IEEE
> -     128-bit floating point type, depending on whether long double is the
> -     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
> -     It is simpler if we only define one variant of the built-in function,
> -     and switch the code when defining it, rather than defining two built-
> -     ins and using the overload table in rs6000-c.cc to switch between the
> -     two.  If we don't have the proper assembler, don't do this switch
> -     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
> -  if (FLOAT128_IEEE_P (TFmode))
> -    switch (icode)
> -      {
> -      case CODE_FOR_sqrtkf2_odd:
> -	icode = CODE_FOR_sqrttf2_odd;
> -	break;
> -      case CODE_FOR_trunckfdf2_odd:
> -	icode = CODE_FOR_trunctfdf2_odd;
> -	break;
> -      case CODE_FOR_addkf3_odd:
> -	icode = CODE_FOR_addtf3_odd;
> -	break;
> -      case CODE_FOR_subkf3_odd:
> -	icode = CODE_FOR_subtf3_odd;
> -	break;
> -      case CODE_FOR_mulkf3_odd:
> -	icode = CODE_FOR_multf3_odd;
> -	break;
> -      case CODE_FOR_divkf3_odd:
> -	icode = CODE_FOR_divtf3_odd;
> -	break;
> -      case CODE_FOR_fmakf4_odd:
> -	icode = CODE_FOR_fmatf4_odd;
> -	break;
> -      case CODE_FOR_xsxexpqp_kf:
> -	icode = CODE_FOR_xsxexpqp_tf;
> -	break;
> -      case CODE_FOR_xsxsigqp_kf:
> -	icode = CODE_FOR_xsxsigqp_tf;
> -	break;
> -      case CODE_FOR_xststdcnegqp_kf:
> -	icode = CODE_FOR_xststdcnegqp_tf;
> -	break;
> -      case CODE_FOR_xsiexpqp_kf:
> -	icode = CODE_FOR_xsiexpqp_tf;
> -	break;
> -      case CODE_FOR_xsiexpqpf_kf:
> -	icode = CODE_FOR_xsiexpqpf_tf;
> -	break;
> -      case CODE_FOR_xststdcqp_kf:
> -	icode = CODE_FOR_xststdcqp_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_eq_kf:
> -	icode = CODE_FOR_xscmpexpqp_eq_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_lt_kf:
> -	icode = CODE_FOR_xscmpexpqp_lt_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_gt_kf:
> -	icode = CODE_FOR_xscmpexpqp_gt_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_unordered_kf:
> -	icode = CODE_FOR_xscmpexpqp_unordered_tf;
> -	break;
> -      default:
> -	break;
> -      }
> +  /* For 128-bit long double, we may need both the KFmode built-in functions
> +     and IFmode built-in functions to the equivalent TFmode built-in function,
> +     if either a TFmode result is expected or any of the arguments use
> +     TFmode.  */
> +  if (TARGET_LONG_DOUBLE_128)
> +    {
> +      bool uses_tf_mode = return_mode == TFmode;
> +      if (!uses_tf_mode)
> +	{
> +	  call_expr_arg_iterator iter;
> +	  tree arg;
> +	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> +	    {
> +	      if (arg != error_mark_node
> +		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
> +		{
> +		  uses_tf_mode = true;
> +		  break;
> +		}
> +	    }
> +	}
> +
> +      /* Convert KFmode built-in functions to TFmode when long double is IEEE
> +	 128-bit.  */
> +      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
> +	switch (icode)
> +	  {
> +	  case CODE_FOR_sqrtkf2_odd:
> +	    icode = CODE_FOR_sqrttf2_odd;
> +	    break;
> +	  case CODE_FOR_trunckfdf2_odd:
> +	    icode = CODE_FOR_trunctfdf2_odd;
> +	    break;
> +	  case CODE_FOR_addkf3_odd:
> +	    icode = CODE_FOR_addtf3_odd;
> +	    break;
> +	  case CODE_FOR_subkf3_odd:
> +	    icode = CODE_FOR_subtf3_odd;
> +	    break;
> +	  case CODE_FOR_mulkf3_odd:
> +	    icode = CODE_FOR_multf3_odd;
> +	    break;
> +	  case CODE_FOR_divkf3_odd:
> +	    icode = CODE_FOR_divtf3_odd;
> +	    break;
> +	  case CODE_FOR_fmakf4_odd:
> +	    icode = CODE_FOR_fmatf4_odd;
> +	    break;
> +	  case CODE_FOR_xsxexpqp_kf:
> +	    icode = CODE_FOR_xsxexpqp_tf;
> +	    break;
> +	  case CODE_FOR_xsxsigqp_kf:
> +	    icode = CODE_FOR_xsxsigqp_tf;
> +	    break;
> +	  case CODE_FOR_xststdcnegqp_kf:
> +	    icode = CODE_FOR_xststdcnegqp_tf;
> +	    break;
> +	  case CODE_FOR_xsiexpqp_kf:
> +	    icode = CODE_FOR_xsiexpqp_tf;
> +	    break;
> +	  case CODE_FOR_xsiexpqpf_kf:
> +	    icode = CODE_FOR_xsiexpqpf_tf;
> +	    break;
> +	  case CODE_FOR_xststdcqp_kf:
> +	    icode = CODE_FOR_xststdcqp_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_eq_kf:
> +	    icode = CODE_FOR_xscmpexpqp_eq_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_lt_kf:
> +	    icode = CODE_FOR_xscmpexpqp_lt_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_gt_kf:
> +	    icode = CODE_FOR_xscmpexpqp_gt_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_unordered_kf:
> +	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
> +	    break;
> +	  default:
> +	    break;
> +	  }
> +
> +      /* Convert IFmode built-in functions to TFmode when long double is IBM
> +	 128-bit.  */
> +      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
> +	{
> +	  if (icode == CODE_FOR_packif)
> +	    icode = CODE_FOR_packtf;
> +
> +	  else if (icode == CODE_FOR_unpackif)
> +	    icode = CODE_FOR_unpacktf;
> +	}
> +    }
>  
>    /* In case of "#pragma target" changes, we initialize all builtins
>       but check for actual availability now, during expand time.  For
> @@ -3481,18 +3505,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
>  
>    if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
>      {
> -      if (fcode == RS6000_BIF_PACK_IF)
> -	{
> -	  icode = CODE_FOR_packtf;
> -	  fcode = RS6000_BIF_PACK_TF;
> -	  uns_fcode = (size_t) fcode;
> -	}
> -      else if (fcode == RS6000_BIF_UNPACK_IF)
> -	{
> -	  icode = CODE_FOR_unpacktf;
> -	  fcode = RS6000_BIF_UNPACK_TF;
> -	  uns_fcode = (size_t) fcode;
> -	}
>      }

This remaining "if" hunk is useless.  The others looks good to me.

BR,
Kewen
  
Kewen.Lin Dec. 14, 2022, 8:46 a.m. UTC | #5
on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
> Hi Mike,
> 
> Thanks for fixing this, some comments are inlined below.
> 
> on 2022/11/2 10:42, Michael Meissner wrote:
>> This patch fixes the issue that GCC cannot build when the default long double
>> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
>> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
>> during the evrp pass.  Ultimately it is failing because the code declared the
>> type to use TFmode but it used F128 functions (i.e. KFmode).

By further looking into this, I found that though __float128 and _Float128 types
are two different types, they have the same mode TFmode, the unexpected thing is
these two types have different precision.  I noticed it's due to the "workaround"
in build_common_tree_nodes:

      /* Work around the rs6000 KFmode having precision 113 not
	 128.  */
      const struct real_format *fmt = REAL_MODE_FORMAT (mode);
      gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
      int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
      if (!extended)
	gcc_assert (min_precision == n);
      if (precision < min_precision)
	precision = min_precision;

Since function useless_type_conversion_p considers two float types are compatible
if they have the same mode, so it doesn't require the explicit conversions between
these two types.  I think it's exactly what we want.  And to me, it looks unexpected
to have two types with the same mode but different precision.

So could we consider disabling the above workaround to make _Float128 have the same
precision as __float128 (long double) (the underlying TFmode)?  I tried the below
change:

diff --git a/gcc/tree.cc b/gcc/tree.cc
index 254b2373dcf..10fcb3d88ca 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9442,6 +9442,7 @@ build_common_tree_nodes (bool signed_char)
       if (!targetm.floatn_mode (n, extended).exists (&mode))
         continue;
       int precision = GET_MODE_PRECISION (mode);
+#if 0
       /* Work around the rs6000 KFmode having precision 113 not
          128.  */
       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
@@ -9451,6 +9452,7 @@ build_common_tree_nodes (bool signed_char)
         gcc_assert (min_precision == n);
       if (precision < min_precision)
         precision = min_precision;
+#endif
       FLOATN_NX_TYPE_NODE (i) = make_node (REAL_TYPE);
       TYPE_PRECISION (FLOATN_NX_TYPE_NODE (i)) = precision;
       layout_type (FLOATN_NX_TYPE_NODE (i));

It can be bootstrapped (fixing the ICE in PR107299).  Comparing with the baseline
regression test results with patch #1, #2 and #3, I got some positive:

FAIL->PASS: 17_intro/headers/c++2020/all_attributes.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_no_exceptions.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_no_rtti.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_pedantic_errors.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/operator_names.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/stdc++.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/stdc++_multiple_inclusion.cc (test for excess errors)
FAIL->PASS: std/format/arguments/args.cc (test for excess errors)
FAIL->PASS: std/format/error.cc (test for excess errors)
FAIL->PASS: std/format/formatter/requirements.cc (test for excess errors)
FAIL->PASS: std/format/functions/format.cc (test for excess errors)
FAIL->PASS: std/format/functions/format_to_n.cc (test for excess errors)
FAIL->PASS: std/format/functions/size.cc (test for excess errors)
FAIL->PASS: std/format/functions/vformat_to.cc (test for excess errors)
FAIL->PASS: std/format/parse_ctx.cc (test for excess errors)
FAIL->PASS: std/format/string.cc (test for excess errors)
FAIL->PASS: std/format/string_neg.cc (test for excess errors)
FAIL->PASS: g++.dg/cpp23/ext-floating1.C  -std=gnu++23 (test for excess errors)

and some negative:

PASS->FAIL: gcc.dg/torture/float128-nan.c   -O0  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O1  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O3 -g  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -Os  execution test
PASS->FAIL: gcc.target/powerpc/nan128-1.c execution test

The negative part is about nan, I haven't looked into it, but I think it may be the
reason why we need the workaround there, CC Joseph.  Anyway it needs more investigation
here, but IMHO the required information (ie. the actual precision) can be retrieved
from REAL_MODE_FORMAT(mode) of TYPE_MODE, so it should be doable to fix some other
places.

Some concerns on the way of patch #2 are:
  1) long double is with TFmode, it's not taken as compatible with _Float128 even
     if -mabi=ieeelongdouble specified, we can have inefficient IRs than before.
  2) we make type attribute with TFmode _Float128 (KFmode), but there is one actual
     type long double with TFmode, they are actually off.

Comparing with patch #2, this way to remove the workaround in build_common_tree_nodes
can still keep the things as before.  It may be something we can consider here.

BR,
Kewen
  
Jakub Jelinek Dec. 14, 2022, 9:36 a.m. UTC | #6
On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
> > Hi Mike,
> > 
> > Thanks for fixing this, some comments are inlined below.
> > 
> > on 2022/11/2 10:42, Michael Meissner wrote:
> >> This patch fixes the issue that GCC cannot build when the default long double
> >> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> >> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> >> during the evrp pass.  Ultimately it is failing because the code declared the
> >> type to use TFmode but it used F128 functions (i.e. KFmode).
> 
> By further looking into this, I found that though __float128 and _Float128 types
> are two different types, they have the same mode TFmode, the unexpected thing is
> these two types have different precision.  I noticed it's due to the "workaround"
> in build_common_tree_nodes:
> 
>       /* Work around the rs6000 KFmode having precision 113 not
> 	 128.  */
>       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
>       gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
>       int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
>       if (!extended)
> 	gcc_assert (min_precision == n);
>       if (precision < min_precision)
> 	precision = min_precision;
> 
> Since function useless_type_conversion_p considers two float types are compatible
> if they have the same mode, so it doesn't require the explicit conversions between
> these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> to have two types with the same mode but different precision.
> 
> So could we consider disabling the above workaround to make _Float128 have the same
> precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> change:

The hacks with different precisions of powerpc 128-bit floating types are
very unfortunate, it is I assume because the middle-end asserted that scalar
floating point types with different modes have different precision.
We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
the same 16-bit precision as well and e.g. C++ FE knows to treat standard
vs. extended floating point types vs. other unknown floating point types
differently in finding result type of binary operations or in which type
comparisons will be done.  That said, we'd need some target hooks to
preserve the existing behavior with __float128/__ieee128 vs. __ibm128
vs. _Float128 with both -mabi=ibmlongdouble and -mabi=ieeelongdouble.

I bet the above workaround in generic code was added for a reason, it would
surprise me if _Float128 worked at all without that hack.
Shouldn't float128_type_node be adjusted instead the same way?

	Jakub
  
Kewen.Lin Dec. 14, 2022, 10:11 a.m. UTC | #7
Hi Jakub,

Thanks for the comments!

on 2022/12/14 17:36, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
>> on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
>>> Hi Mike,
>>>
>>> Thanks for fixing this, some comments are inlined below.
>>>
>>> on 2022/11/2 10:42, Michael Meissner wrote:
>>>> This patch fixes the issue that GCC cannot build when the default long double
>>>> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
>>>> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
>>>> during the evrp pass.  Ultimately it is failing because the code declared the
>>>> type to use TFmode but it used F128 functions (i.e. KFmode).
>>
>> By further looking into this, I found that though __float128 and _Float128 types
>> are two different types, they have the same mode TFmode, the unexpected thing is
>> these two types have different precision.  I noticed it's due to the "workaround"
>> in build_common_tree_nodes:
>>
>>       /* Work around the rs6000 KFmode having precision 113 not
>> 	 128.  */
>>       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
>>       gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
>>       int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
>>       if (!extended)
>> 	gcc_assert (min_precision == n);
>>       if (precision < min_precision)
>> 	precision = min_precision;
>>
>> Since function useless_type_conversion_p considers two float types are compatible
>> if they have the same mode, so it doesn't require the explicit conversions between
>> these two types.  I think it's exactly what we want.  And to me, it looks unexpected
>> to have two types with the same mode but different precision.
>>
>> So could we consider disabling the above workaround to make _Float128 have the same
>> precision as __float128 (long double) (the underlying TFmode)?  I tried the below
>> change:
> 
> The hacks with different precisions of powerpc 128-bit floating types are
> very unfortunate, it is I assume because the middle-end asserted that scalar
> floating point types with different modes have different precision.
> We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
> the same 16-bit precision as well and e.g. C++ FE knows to treat standard
> vs. extended floating point types vs. other unknown floating point types
> differently in finding result type of binary operations or in which type
> comparisons will be done.  

It's good news, for now those three long double modes on Power have different
precisions, if they can have the same precision, I'd expect the ICE should be
gone.

> That said, we'd need some target hooks to
> preserve the existing behavior with __float128/__ieee128 vs. __ibm128
> vs. _Float128 with both -mabi=ibmlongdouble and -mabi=ieeelongdouble.
> 
> I bet the above workaround in generic code was added for a reason, it would
> surprise me if _Float128 worked at all without that hack.

OK, I'll have a look at those nan failures soon.

> Shouldn't float128_type_node be adjusted instead the same way?

Not sure, the regression testing only had nan related failures exposed.

BR,
Kewen
  
Jakub Jelinek Dec. 14, 2022, 10:33 a.m. UTC | #8
On Wed, Dec 14, 2022 at 06:11:26PM +0800, Kewen.Lin wrote:
> > The hacks with different precisions of powerpc 128-bit floating types are
> > very unfortunate, it is I assume because the middle-end asserted that scalar
> > floating point types with different modes have different precision.
> > We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
> > the same 16-bit precision as well and e.g. C++ FE knows to treat standard
> > vs. extended floating point types vs. other unknown floating point types
> > differently in finding result type of binary operations or in which type
> > comparisons will be done.  
> 
> It's good news, for now those three long double modes on Power have different
> precisions, if they can have the same precision, I'd expect the ICE should be
> gone.

I'm talking mainly about r13-3292, the asserts now check different modes
have different precision unless it is half vs. brain or vice versa, but
could be changed further, but if the precision is the same, the FEs
and the middle-end needs to know how to deal with those.
For C++23, say when __ibm128 is the same as long double and _Float128 is
supported, the 2 types are unordered (neither is a subset or superset of
the other because there are many _Float128 values one can't represent
in double double (whether anything with exponent larger than what double
can represent or most of the more precise values), but because of the
variable precision there are double double values that can't be represented
in _Float128 either) and so we can error on comparisons of those or on
arithmetics with such arguments (unless explicitly converted first).
But for backwards compatibility we can't do that for __float128 vs. __ibm128
and so need to backwards compatibly decide what wins.  And for the
middle-end say even for mode conversions decide what is widening and what is
narrowing even when they are unordered.

	Jakub
  
Kewen.Lin Dec. 15, 2022, 7:45 a.m. UTC | #9
>> I bet the above workaround in generic code was added for a reason, it would
>> surprise me if _Float128 worked at all without that hack.
> 
> OK, I'll have a look at those nan failures soon.

By investigating the exposed NaN failures, I found it's due to that it wants
to convert _Float128 type constant to long double type constant, it goes
through function real_convert which clears the signalling bit in the context
of !HONOR_SNANS (arg).

  if (r->cl == rvc_nan)
    r->signalling = 0;

The test cases don't have the explicit option -fsignaling-nans, I'm inclined
to believe it's intentional since there is only a sNaN generation.  If so,
we don't want this kind of conversion which is useless and can clear signalling
bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
and rebuild with the given type if the modes are the same.

-----
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index e80be8049e1..d036b09dc6f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -2178,6 +2178,14 @@ fold_convert_const_real_from_real (tree type, const_tree arg1)
   REAL_VALUE_TYPE value;
   tree t;

+  /* If the underlying modes are the same, just copy the
+     TREE_REAL_CST information and rebuild with the given type.  */
+  if (TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (arg1)))
+    {
+      t = build_real (type, TREE_REAL_CST (arg1));
+      return t;
+    }
+
   /* Don't perform the operation if flag_signaling_nans is on
      and the operand is a signaling NaN.  */
   if (HONOR_SNANS (arg1)

-----

The above diff can fix all exposed NaN failures.

BR,
Kewen
  
Kewen.Lin Dec. 15, 2022, 7:54 a.m. UTC | #10
on 2022/12/14 18:33, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 06:11:26PM +0800, Kewen.Lin wrote:
>>> The hacks with different precisions of powerpc 128-bit floating types are
>>> very unfortunate, it is I assume because the middle-end asserted that scalar
>>> floating point types with different modes have different precision.
>>> We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
>>> the same 16-bit precision as well and e.g. C++ FE knows to treat standard
>>> vs. extended floating point types vs. other unknown floating point types
>>> differently in finding result type of binary operations or in which type
>>> comparisons will be done.  
>>
>> It's good news, for now those three long double modes on Power have different
>> precisions, if they can have the same precision, I'd expect the ICE should be
>> gone.
> 
> I'm talking mainly about r13-3292, the asserts now check different modes
> have different precision unless it is half vs. brain or vice versa, but
> could be changed further, but if the precision is the same, the FEs
> and the middle-end needs to know how to deal with those.
> For C++23, say when __ibm128 is the same as long double and _Float128 is
> supported, the 2 types are unordered (neither is a subset or superset of
> the other because there are many _Float128 values one can't represent
> in double double (whether anything with exponent larger than what double
> can represent or most of the more precise values), but because of the
> variable precision there are double double values that can't be represented
> in _Float128 either) and so we can error on comparisons of those or on
> arithmetics with such arguments (unless explicitly converted first).
> But for backwards compatibility we can't do that for __float128 vs. __ibm128
> and so need to backwards compatibly decide what wins.  And for the
> middle-end say even for mode conversions decide what is widening and what is
> narrowing even when they are unordered.

Thanks for the pointer!  I don't have good understanding on the backwards
compatibility on those conversions, guessing Mike, Segher and David would have
more insights.

BR,
Kewen
  
Segher Boessenkool Dec. 15, 2022, 5:59 p.m. UTC | #11
Hi!

On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> > Since function useless_type_conversion_p considers two float types are compatible
> > if they have the same mode, so it doesn't require the explicit conversions between
> > these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> > to have two types with the same mode but different precision.
> > 
> > So could we consider disabling the above workaround to make _Float128 have the same
> > precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> > change:
> 
> The hacks with different precisions of powerpc 128-bit floating types are
> very unfortunate, it is I assume because the middle-end asserted that scalar
> floating point types with different modes have different precision.

IEEE QP and double-double cannot be ordered, neither represents a subset
of the other.  But the current middle end does require a total ordering
for all floating point types (that can be converted to each other).

Mike's precision hack was supposed to give us some time until an actual
fix was made.  But no one has worked on that, and there have been
failures found with the precision hack as well, it worked remarkably
well but it isn't perfect.

We cannot move forward in a meaningful way until these problems are
fixed.  We can move around like headless chickens some more of course.


Segher
  
Joseph Myers Dec. 15, 2022, 6:28 p.m. UTC | #12
On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:

> By investigating the exposed NaN failures, I found it's due to that it wants
> to convert _Float128 type constant to long double type constant, it goes
> through function real_convert which clears the signalling bit in the context
> of !HONOR_SNANS (arg).
> 
>   if (r->cl == rvc_nan)
>     r->signalling = 0;
> 
> The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> to believe it's intentional since there is only a sNaN generation.  If so,
> we don't want this kind of conversion which is useless and can clear signalling
> bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> and rebuild with the given type if the modes are the same.

I think this approach - treating floating-point conversions to a type with 
the same mode consistently as a copy rather than a convertFormat operation 
- is reasonable.
  
Segher Boessenkool Dec. 15, 2022, 6:49 p.m. UTC | #13
On Thu, Dec 15, 2022 at 06:28:19PM +0000, Joseph Myers wrote:
> On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:
> > By investigating the exposed NaN failures, I found it's due to that it wants
> > to convert _Float128 type constant to long double type constant, it goes
> > through function real_convert which clears the signalling bit in the context
> > of !HONOR_SNANS (arg).
> > 
> >   if (r->cl == rvc_nan)
> >     r->signalling = 0;
> > 
> > The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> > to believe it's intentional since there is only a sNaN generation.  If so,
> > we don't want this kind of conversion which is useless and can clear signalling
> > bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> > and rebuild with the given type if the modes are the same.
> 
> I think this approach - treating floating-point conversions to a type with 
> the same mode consistently as a copy rather than a convertFormat operation 
> - is reasonable.

Certainly.  But different types with the same mode having different
precision is not so very reasonable, and will likely cause other
problems as well.

We cannot use precision to order modes or types, that is the core
problem.  A conversion from IEEE QP to double-double (or vice versa) is
neither widening nor narrowing.


Segher
  
Jakub Jelinek Dec. 15, 2022, 6:56 p.m. UTC | #14
On Thu, Dec 15, 2022 at 12:49:27PM -0600, Segher Boessenkool wrote:
> On Thu, Dec 15, 2022 at 06:28:19PM +0000, Joseph Myers wrote:
> > On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:
> > > By investigating the exposed NaN failures, I found it's due to that it wants
> > > to convert _Float128 type constant to long double type constant, it goes
> > > through function real_convert which clears the signalling bit in the context
> > > of !HONOR_SNANS (arg).
> > > 
> > >   if (r->cl == rvc_nan)
> > >     r->signalling = 0;
> > > 
> > > The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> > > to believe it's intentional since there is only a sNaN generation.  If so,
> > > we don't want this kind of conversion which is useless and can clear signalling
> > > bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> > > and rebuild with the given type if the modes are the same.
> > 
> > I think this approach - treating floating-point conversions to a type with 
> > the same mode consistently as a copy rather than a convertFormat operation 
> > - is reasonable.
> 
> Certainly.  But different types with the same mode having different
> precision is not so very reasonable, and will likely cause other
> problems as well.
> 
> We cannot use precision to order modes or types, that is the core
> problem.  A conversion from IEEE QP to double-double (or vice versa) is
> neither widening nor narrowing.

Sure.  For optabs, I bet we don't necessarily need to care that much, if
precision is the same, we can ask for widening and narrowing conversion
and expect only one to be implemented or both doing the same thing between
such modes.  But when using libcalls, which library function we use is quite
important because not all of them might be actually implemented in the
library (better keep doing what we've done before).

	Jakub
  
Segher Boessenkool Dec. 15, 2022, 8:26 p.m. UTC | #15
On Thu, Dec 15, 2022 at 07:56:14PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 15, 2022 at 12:49:27PM -0600, Segher Boessenkool wrote:
> > Certainly.  But different types with the same mode having different
> > precision is not so very reasonable, and will likely cause other
> > problems as well.
> > 
> > We cannot use precision to order modes or types, that is the core
> > problem.  A conversion from IEEE QP to double-double (or vice versa) is
> > neither widening nor narrowing.
> 
> Sure.  For optabs, I bet we don't necessarily need to care that much, if
> precision is the same, we can ask for widening and narrowing conversion
> and expect only one to be implemented or both doing the same thing between
> such modes.  But when using libcalls, which library function we use is quite
> important because not all of them might be actually implemented in the
> library (better keep doing what we've done before).

I don't think we should name non-widenings widening, or non-narrowings
narrowing.  We should call conversions conversions.

Even if it doesn't cause technical problems the way it is now (of which
I am not convinced at all), faulty names like this are mightily
confusing.  This is a very real *practical* problem :-(


Segher
  
Michael Meissner Dec. 16, 2022, 12:09 a.m. UTC | #16
On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> > > Since function useless_type_conversion_p considers two float types are compatible
> > > if they have the same mode, so it doesn't require the explicit conversions between
> > > these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> > > to have two types with the same mode but different precision.
> > > 
> > > So could we consider disabling the above workaround to make _Float128 have the same
> > > precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> > > change:
> > 
> > The hacks with different precisions of powerpc 128-bit floating types are
> > very unfortunate, it is I assume because the middle-end asserted that scalar
> > floating point types with different modes have different precision.
> 
> IEEE QP and double-double cannot be ordered, neither represents a subset
> of the other.  But the current middle end does require a total ordering
> for all floating point types (that can be converted to each other).
> 
> Mike's precision hack was supposed to give us some time until an actual
> fix was made.  But no one has worked on that, and there have been
> failures found with the precision hack as well, it worked remarkably
> well but it isn't perfect.
> 
> We cannot move forward in a meaningful way until these problems are
> fixed.  We can move around like headless chickens some more of course.

In general I tend to think most of these automatic widenings are
problematical.  But there are cases where it makes sense.

Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
There you definately the compiler to automatically promoto SDmode to DDmode to
do the arithmetic and then possibly convert it back.  Similarly for the limited
16-bit floating point modes, where you have operations to pack and unpack the
object, but you have no arithmetic.

But I would argue that you NEVER want to automatically promoto DFmode to either
KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
slower than doing the emulation.  This is particularly true if we only support
a subset of operations, where some things can be done inline, but a lot of
operations would need to be done via emulation (such as on power8 where we
don't have IEEE 128-bit support in hardware).

If the machine independent part of the compiler decides oh we can do this
operation because some operations are present (such as move, negate, absolute
value, and compare), then you likely will wind up promoting the 64-bit type(s)
to 128-bit, doing a call to a slower 128-bit function, and then truncating the
value back to 64-bit is faster than calling a 64-bit emulation function.  And
even if the operation is does have a named insn to do the operation, it doesn't
mean that you want to use that operation in general.

I recall in the past that for some x86 boxes, the 80-bit XFmode insns floating
point stack operations on the x86 were really slow compared to the current
SFmode and DFmode SSE operations.  But for some of the older machines, it may
have been faster.  And chosing a different -march=<xxx> would change whether or
not you want to do the optimization.  Having these tables built statically can
be a recipie for disaster.  For floating point at least, I would prefer if a
target had an option to dispense with the statically built get_wider tables,
and did everything via target hooks.

While for the PowerPC, we want to control what is the logical wider type for
floating point types, I can imagine we don't want all backends to have to
implment these hooks if they just have the standard 2-3 floating point modes.

I purposelly haven't been looking into 16-bit floating point support, but I
have to imagine there you have the problem that there are at least 2-3
different 16-bit formats roaming around.  This is essentially the same issue in
the PowerPC where you have 2 128-bit floating point types, neither of which is
a pure subset of the other.

To my way of thinking, it is a many branching tree.  On the PowerPC, you want
SDmode to promoto to DDmode, and possibly to TDmode.  And SFmode mode would
promote to DFmode, but DFmode would not generally promote automtically to
IFmode, TFmode, or KFmode.  We don't have any machines that support it, but I
lets say some machine wanted to support both decimal types (DFP and BID).  You
would logically not want any DFP type to promoto to a BID type or vice versa.

Sure, explicit conversions would be allowed, but not the invisibile conversions
done to promote the type.

In terms of these machine dependent types, there are some issues that show up
when a port creates these special types.

   1)	It would be nice if _Complex worked with MD types.  It is tiresome to
	have to use attribute((mode(...))) to get access to the complex variant
	of the type.

   2)	It would be nice the machine back end could define its own set of
	suffixes for floating point constants.
  
Segher Boessenkool Dec. 16, 2022, 5:55 p.m. UTC | #17
Hi!

On Thu, Dec 15, 2022 at 07:09:38PM -0500, Michael Meissner wrote:
> On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> > On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > > The hacks with different precisions of powerpc 128-bit floating types are
> > > very unfortunate, it is I assume because the middle-end asserted that scalar
> > > floating point types with different modes have different precision.
> > 
> > IEEE QP and double-double cannot be ordered, neither represents a subset
> > of the other.  But the current middle end does require a total ordering
> > for all floating point types (that can be converted to each other).
> > 
> > Mike's precision hack was supposed to give us some time until an actual
> > fix was made.  But no one has worked on that, and there have been
> > failures found with the precision hack as well, it worked remarkably
> > well but it isn't perfect.
> > 
> > We cannot move forward in a meaningful way until these problems are
> > fixed.  We can move around like headless chickens some more of course.
> 
> In general I tend to think most of these automatic widenings are
> problematical.  But there are cases where it makes sense.

These things are *not* widening at all, that is the problem.  For some
values it is lossy, in either direction.

> Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
> There you definately the compiler to automatically promoto SDmode to DDmode to
> do the arithmetic and then possibly convert it back.  Similarly for the limited
> 16-bit floating point modes, where you have operations to pack and unpack the
> object, but you have no arithmetic.

And those things *are* widening, non-lossy in all cases.  Well-defined
etc.

> But I would argue that you NEVER want to automatically promoto DFmode to either
> KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
> slower than doing the emulation.  This is particularly true if we only support
> a subset of operations, where some things can be done inline, but a lot of
> operations would need to be done via emulation (such as on power8 where we
> don't have IEEE 128-bit support in hardware).

TFmode is either the same actual mode as either KFmode or IFmode, let's
reduce confusion by not talking about it at all anymore.

The middle end should never convert to another mode without a good
reason for it.  But OTOH, both IFmode and KFmode can represent all
values in DFmode, you just have to be careful about semantics when
eventually converting back.

> If the machine independent part of the compiler decides oh we can do this
> operation because some operations are present (such as move, negate, absolute
> value, and compare), then you likely will wind up promoting the 64-bit type(s)
> to 128-bit, doing a call to a slower 128-bit function, and then truncating the
> value back to 64-bit is faster than calling a 64-bit emulation function.

This does not in general have the correct semantics though (without
-ffast-math), so the compiler will not do things like it.

> While for the PowerPC, we want to control what is the logical wider type for
> floating point types, I can imagine we don't want all backends to have to
> implment these hooks if they just have the standard 2-3 floating point modes.

KFmode is not wider than IFmode.  IFmode is not wider than KFmode.
KFmode can represent values IFmode cannot.  IFmode can represent valuse
KFmode cannot.

> I purposelly haven't been looking into 16-bit floating point support, but I
> have to imagine there you have the problem that there are at least 2-3
> different 16-bit formats roaming around.  This is essentially the same issue in
> the PowerPC where you have 2 128-bit floating point types, neither of which is
> a pure subset of the other.

There are at least six 16-bit float formats in actual use.  But we care
mostly about IEEE binary16 and about bfloat16, each with or without
non-normal values (and the encodings for those reused for extra normal
numbers, perhaps).

> To my way of thinking, it is a many branching tree.  On the PowerPC, you want
> SDmode to promoto to DDmode,

But it is the backend that should be in control there, not generic code.
And that is the way things already work!

> and possibly to TDmode.  And SFmode mode would
> promote to DFmode,

In general we do not, it would not have the correct semantics.

Anyway, to get us back on track a bit:

There should not be any non-explicit conversions from KFmode to IFmode
or vice versa.  Each of those modes can represent values the other can
not.  There is no ordering between those modes in that sense, we should
not pretend there is one.


Segher
  
Michael Meissner Dec. 16, 2022, 9:53 p.m. UTC | #18
On Fri, Dec 16, 2022 at 11:55:27AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Dec 15, 2022 at 07:09:38PM -0500, Michael Meissner wrote:
> > On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> > > On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > > > The hacks with different precisions of powerpc 128-bit floating types are
> > > > very unfortunate, it is I assume because the middle-end asserted that scalar
> > > > floating point types with different modes have different precision.
> > > 
> > > IEEE QP and double-double cannot be ordered, neither represents a subset
> > > of the other.  But the current middle end does require a total ordering
> > > for all floating point types (that can be converted to each other).
> > > 
> > > Mike's precision hack was supposed to give us some time until an actual
> > > fix was made.  But no one has worked on that, and there have been
> > > failures found with the precision hack as well, it worked remarkably
> > > well but it isn't perfect.
> > > 
> > > We cannot move forward in a meaningful way until these problems are
> > > fixed.  We can move around like headless chickens some more of course.
> > 
> > In general I tend to think most of these automatic widenings are
> > problematical.  But there are cases where it makes sense.
> 
> These things are *not* widening at all, that is the problem.  For some
> values it is lossy, in either direction.

Ummm yes and no.

Going from SFmode to DFmode is a widening, as is SDmode to DDmode.  Since all
values within SFmode or SDmode can be represented in DFmode/DDmode.  That is
needed, since not all machines have full support for arithmetic.

Going from DFmode to KFmode is still a widening, but in general we may want to
prevent it from happing due to the speed of KFmode operations compared to
DFmode.  Likewise going from DFmode to IFmode is a widening since all values in
DFmode can be represented in IFmode.

Obviously going from IFmode to KFmode or the reverse is where the issue it.

> > Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
> > There you definately the compiler to automatically promoto SDmode to DDmode to
> > do the arithmetic and then possibly convert it back.  Similarly for the limited
> > 16-bit floating point modes, where you have operations to pack and unpack the
> > object, but you have no arithmetic.
> 
> And those things *are* widening, non-lossy in all cases.  Well-defined
> etc.

Yes, but we just need to improve the hooks to prevent cases where it is not
defined.

> > But I would argue that you NEVER want to automatically promoto DFmode to either
> > KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
> > slower than doing the emulation.  This is particularly true if we only support
> > a subset of operations, where some things can be done inline, but a lot of
> > operations would need to be done via emulation (such as on power8 where we
> > don't have IEEE 128-bit support in hardware).
> 
> TFmode is either the same actual mode as either KFmode or IFmode, let's
> reduce confusion by not talking about it at all anymore.
> 
> The middle end should never convert to another mode without a good
> reason for it.  But OTOH, both IFmode and KFmode can represent all
> values in DFmode, you just have to be careful about semantics when
> eventually converting back.

It doesn't.  Where the issue is is when you call a built-in function and that
built-in function uses different types than what you pass.  Then conversions
have to be inserted.

> > If the machine independent part of the compiler decides oh we can do this
> > operation because some operations are present (such as move, negate, absolute
> > value, and compare), then you likely will wind up promoting the 64-bit type(s)
> > to 128-bit, doing a call to a slower 128-bit function, and then truncating the
> > value back to 64-bit is faster than calling a 64-bit emulation function.
> 
> This does not in general have the correct semantics though (without
> -ffast-math), so the compiler will not do things like it.

It would happen if we didn't set the hooks to prevent it, but we did.  Maybe
there are places that need more hooks.

> > While for the PowerPC, we want to control what is the logical wider type for
> > floating point types, I can imagine we don't want all backends to have to
> > implment these hooks if they just have the standard 2-3 floating point modes.
> 
> KFmode is not wider than IFmode.  IFmode is not wider than KFmode.
> KFmode can represent values IFmode cannot.  IFmode can represent valuse
> KFmode cannot.

There the issue is the historical issue that GCC believes there is only number
(precision) that says whether one type is wider than another.  And to some
extent, precision is wrong, in that do you want precision to mean the number of
bytes in a value, the mantissa size, the exponent size, or whether special
cases matter.

> > I purposelly haven't been looking into 16-bit floating point support, but I
> > have to imagine there you have the problem that there are at least 2-3
> > different 16-bit formats roaming around.  This is essentially the same issue in
> > the PowerPC where you have 2 128-bit floating point types, neither of which is
> > a pure subset of the other.
> 
> There are at least six 16-bit float formats in actual use.  But we care
> mostly about IEEE binary16 and about bfloat16, each with or without
> non-normal values (and the encodings for those reused for extra normal
> numbers, perhaps).

You and I may only care about those two types.  Other port maintainers may need
to consider any of the other variants.

> > To my way of thinking, it is a many branching tree.  On the PowerPC, you want
> > SDmode to promoto to DDmode,
> 
> But it is the backend that should be in control there, not generic code.
> And that is the way things already work!

No that is not the way things work.  The way the types are laid out, it demands
that each precision is unique with the class of binary vs. decimal.  And that
without hooks, that for any floating point type, you can ask for a wider
precision, and it will give it to you.  If we didn't have the requirement that
there only be one type per precision then things would be simpler.  We would
need some way of going to the wider type rather than the pre-built tables we
currently.

> > and possibly to TDmode.  And SFmode mode would
> > promote to DFmode,
> 
> In general we do not, it would not have the correct semantics.

When you combine two types in a binary operator and the types are different,
you have to either convert one side to the other type, or you have to convert
both sides to a common type.

> Anyway, to get us back on track a bit:
> 
> There should not be any non-explicit conversions from KFmode to IFmode
> or vice versa.  Each of those modes can represent values the other can
> not.  There is no ordering between those modes in that sense, we should
> not pretend there is one.

And we have already prevented that (subject to places that were missed).  But
when a built-in is called with another type, it forces conversions to be done.

In the specific cases, since currently GCC uses different internal types for
__float128 and _Float128 in the case long doubles are IEEE 128-bit, if for
instance, you have a __float128 type and you assign it the __builtin_nanq
function that involves a conversion from the _Float128 type to __float128.

In the case where long double is IBM double double or just double, then
__float128 and _Float128 are the same internal type, and no conversion is
done.
  
Michael Meissner Jan. 11, 2023, 8:24 p.m. UTC | #19
On Tue, Nov 01, 2022 at 10:42:30PM -0400, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).

Unfortunately, this patch no longer works against the trunk.  I have a simpler
patch to libgcc that uses the _Complex _Float128 and _Float128 types for
building the IEEE 128-bit support in libgcc.  It doesn't fix the problem in the
compiler, but it will allow us to go forward and build GCC on targets that have
IEEE 128-bit floating point support (i.e. Fedora 36).
  

Patch

diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
index 90ab39dc258..e5298f45363 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -730,25 +730,28 @@  rs6000_init_builtins (void)
 
   if (TARGET_FLOAT128_TYPE)
     {
-      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
-	ieee128_float_type_node = long_double_type_node;
-      else
+      /* In the past we used long_double_type_node when long double was IEEE
+	 128-bit.  However, this means that the _Float128 type
+	 (i.e. float128_type_node) is a different type from __float128
+	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
+	 such as processing signaling NaNs with the nansf128 built-in function
+	 (which returns a _Float128 value) and assign it to a long double or
+	 __float128 value.  The two explicit IEEE 128-bit types should always
+	 use the same internal type.
+
+	 For C we only need to register the __ieee128 name for it.  For C++, we
+	 create a distinct type which will mangle differently (u9__ieee128)
+	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
+      if (float128t_type_node == NULL_TREE)
 	{
-	  /* For C we only need to register the __ieee128 name for
-	     it.  For C++, we create a distinct type which will mangle
-	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
-	     backwards compatibly.  */
-	  if (float128t_type_node == NULL_TREE)
-	    {
-	      float128t_type_node = make_node (REAL_TYPE);
-	      TYPE_PRECISION (float128t_type_node)
-		= TYPE_PRECISION (float128_type_node);
-	      layout_type (float128t_type_node);
-	      SET_TYPE_MODE (float128t_type_node,
-			     TYPE_MODE (float128_type_node));
-	    }
-	  ieee128_float_type_node = float128t_type_node;
+	  float128t_type_node = make_node (REAL_TYPE);
+	  TYPE_PRECISION (float128t_type_node)
+	    = TYPE_PRECISION (float128_type_node);
+	  layout_type (float128t_type_node);
+	  SET_TYPE_MODE (float128t_type_node,
+			 TYPE_MODE (float128_type_node));
 	}
+      ieee128_float_type_node = float128t_type_node;
       t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
       lang_hooks.types.register_builtin_type (ieee128_float_type_node,
 					      "__ieee128");
@@ -3265,13 +3268,13 @@  htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
+   (and in mode RETURN_MODE if that's convenient).
    SUBTARGET may be used as the target for computing one of EXP's operands.
    IGNORE is nonzero if the value is to be ignored.
    Use the new builtin infrastructure.  */
 rtx
 rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
-		       machine_mode /* mode */, int ignore)
+		       machine_mode return_mode, int ignore)
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_gen_builtins fcode
@@ -3287,78 +3290,99 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
   size_t uns_fcode = (size_t)fcode;
   enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
 
-  /* TODO: The following commentary and code is inherited from the original
-     builtin processing code.  The commentary is a bit confusing, with the
-     intent being that KFmode is always IEEE-128, IFmode is always IBM
-     double-double, and TFmode is the current long double.  The code is
-     confusing in that it converts from KFmode to TFmode pattern names,
-     when the other direction is more intuitive.  Try to address this.  */
-
-  /* We have two different modes (KFmode, TFmode) that are the IEEE
-     128-bit floating point type, depending on whether long double is the
-     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
-     It is simpler if we only define one variant of the built-in function,
-     and switch the code when defining it, rather than defining two built-
-     ins and using the overload table in rs6000-c.cc to switch between the
-     two.  If we don't have the proper assembler, don't do this switch
-     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
-  if (FLOAT128_IEEE_P (TFmode))
-    switch (icode)
-      {
-      case CODE_FOR_sqrtkf2_odd:
-	icode = CODE_FOR_sqrttf2_odd;
-	break;
-      case CODE_FOR_trunckfdf2_odd:
-	icode = CODE_FOR_trunctfdf2_odd;
-	break;
-      case CODE_FOR_addkf3_odd:
-	icode = CODE_FOR_addtf3_odd;
-	break;
-      case CODE_FOR_subkf3_odd:
-	icode = CODE_FOR_subtf3_odd;
-	break;
-      case CODE_FOR_mulkf3_odd:
-	icode = CODE_FOR_multf3_odd;
-	break;
-      case CODE_FOR_divkf3_odd:
-	icode = CODE_FOR_divtf3_odd;
-	break;
-      case CODE_FOR_fmakf4_odd:
-	icode = CODE_FOR_fmatf4_odd;
-	break;
-      case CODE_FOR_xsxexpqp_kf:
-	icode = CODE_FOR_xsxexpqp_tf;
-	break;
-      case CODE_FOR_xsxsigqp_kf:
-	icode = CODE_FOR_xsxsigqp_tf;
-	break;
-      case CODE_FOR_xststdcnegqp_kf:
-	icode = CODE_FOR_xststdcnegqp_tf;
-	break;
-      case CODE_FOR_xsiexpqp_kf:
-	icode = CODE_FOR_xsiexpqp_tf;
-	break;
-      case CODE_FOR_xsiexpqpf_kf:
-	icode = CODE_FOR_xsiexpqpf_tf;
-	break;
-      case CODE_FOR_xststdcqp_kf:
-	icode = CODE_FOR_xststdcqp_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_eq_kf:
-	icode = CODE_FOR_xscmpexpqp_eq_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_lt_kf:
-	icode = CODE_FOR_xscmpexpqp_lt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_gt_kf:
-	icode = CODE_FOR_xscmpexpqp_gt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_unordered_kf:
-	icode = CODE_FOR_xscmpexpqp_unordered_tf;
-	break;
-      default:
-	break;
-      }
+  /* For 128-bit long double, we may need both the KFmode built-in functions
+     and IFmode built-in functions to the equivalent TFmode built-in function,
+     if either a TFmode result is expected or any of the arguments use
+     TFmode.  */
+  if (TARGET_LONG_DOUBLE_128)
+    {
+      bool uses_tf_mode = return_mode == TFmode;
+      if (!uses_tf_mode)
+	{
+	  call_expr_arg_iterator iter;
+	  tree arg;
+	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+	    {
+	      if (arg != error_mark_node
+		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
+		{
+		  uses_tf_mode = true;
+		  break;
+		}
+	    }
+	}
+
+      /* Convert KFmode built-in functions to TFmode when long double is IEEE
+	 128-bit.  */
+      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
+	switch (icode)
+	  {
+	  case CODE_FOR_sqrtkf2_odd:
+	    icode = CODE_FOR_sqrttf2_odd;
+	    break;
+	  case CODE_FOR_trunckfdf2_odd:
+	    icode = CODE_FOR_trunctfdf2_odd;
+	    break;
+	  case CODE_FOR_addkf3_odd:
+	    icode = CODE_FOR_addtf3_odd;
+	    break;
+	  case CODE_FOR_subkf3_odd:
+	    icode = CODE_FOR_subtf3_odd;
+	    break;
+	  case CODE_FOR_mulkf3_odd:
+	    icode = CODE_FOR_multf3_odd;
+	    break;
+	  case CODE_FOR_divkf3_odd:
+	    icode = CODE_FOR_divtf3_odd;
+	    break;
+	  case CODE_FOR_fmakf4_odd:
+	    icode = CODE_FOR_fmatf4_odd;
+	    break;
+	  case CODE_FOR_xsxexpqp_kf:
+	    icode = CODE_FOR_xsxexpqp_tf;
+	    break;
+	  case CODE_FOR_xsxsigqp_kf:
+	    icode = CODE_FOR_xsxsigqp_tf;
+	    break;
+	  case CODE_FOR_xststdcnegqp_kf:
+	    icode = CODE_FOR_xststdcnegqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqp_kf:
+	    icode = CODE_FOR_xsiexpqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqpf_kf:
+	    icode = CODE_FOR_xsiexpqpf_tf;
+	    break;
+	  case CODE_FOR_xststdcqp_kf:
+	    icode = CODE_FOR_xststdcqp_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_eq_kf:
+	    icode = CODE_FOR_xscmpexpqp_eq_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_lt_kf:
+	    icode = CODE_FOR_xscmpexpqp_lt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_gt_kf:
+	    icode = CODE_FOR_xscmpexpqp_gt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_unordered_kf:
+	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
+	    break;
+	  default:
+	    break;
+	  }
+
+      /* Convert IFmode built-in functions to TFmode when long double is IBM
+	 128-bit.  */
+      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
+	{
+	  if (icode == CODE_FOR_packif)
+	    icode = CODE_FOR_packtf;
+
+	  else if (icode == CODE_FOR_unpackif)
+	    icode = CODE_FOR_unpacktf;
+	}
+    }
 
   /* In case of "#pragma target" changes, we initialize all builtins
      but check for actual availability now, during expand time.  For
@@ -3481,18 +3505,6 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
     {
-      if (fcode == RS6000_BIF_PACK_IF)
-	{
-	  icode = CODE_FOR_packtf;
-	  fcode = RS6000_BIF_PACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
-      else if (fcode == RS6000_BIF_UNPACK_IF)
-	{
-	  icode = CODE_FOR_unpacktf;
-	  fcode = RS6000_BIF_UNPACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
     }
 
   /* TRUE iff the built-in function returns void.  */
@@ -3647,7 +3659,24 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   for (int i = 0; i < nargs; i++)
     if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
-      op[i] = copy_to_mode_reg (mode[i+k], op[i]);
+      {
+	/* If the predicate failed because the modes are different, do a
+	   convert instead of copy_to_mode_reg, since copy_to_mode_reg will
+	   abort in this case.  The modes might be different if we have two
+	   different 128-bit floating point modes (i.e. KFmode/TFmode if long
+	   double is IEEE 128-bit and IFmode/TFmode if long double is IBM
+	   128-bit).  */
+	machine_mode mode_insn = mode[i+k];
+	machine_mode mode_op = GET_MODE (op[i]);
+	if (mode_insn != mode_op && mode_op != VOIDmode)
+	  {
+	    rtx tmp = gen_reg_rtx (mode_insn);
+	    convert_move (tmp, op[i], 0);
+	    op[i] = tmp;
+	  }
+	else
+	  op[i] = copy_to_mode_reg (mode_insn, op[i]);
+      }
 
   rtx pat;
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cfb6227e27b..8a8357512c0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23851,15 +23851,23 @@  rs6000_eh_return_filter_mode (void)
   return TARGET_32BIT ? SImode : word_mode;
 }
 
-/* Target hook for translate_mode_attribute.  */
+/* Target hook for translate_mode_attribute.
+
+   When -mabi=ieeelongdouble is used, we want to translate either KFmode or
+   TFmode to KFmode.  This is because user code that wants to specify IEEE
+   128-bit types will use either TFmode or KFmode, and we really want to use
+   the _Float128 and __float128 types instead of long double.
+
+   Similarly when -mabi=ibmlongdouble is used, we want to map IFmode into
+   TFmode.  */
 static machine_mode
 rs6000_translate_mode_attribute (machine_mode mode)
 {
-  if ((FLOAT128_IEEE_P (mode)
-       && ieee128_float_type_node == long_double_type_node)
-      || (FLOAT128_IBM_P (mode)
-	  && ibm128_float_type_node == long_double_type_node))
+  if (FLOAT128_IBM_P (mode)
+      && ibm128_float_type_node == long_double_type_node)
     return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
+  else if (FLOAT128_IEEE_P (mode))
+    return COMPLEX_MODE_P (mode) ? E_KCmode : E_KFmode;
   return mode;
 }
 
@@ -23895,13 +23903,10 @@  rs6000_libgcc_floating_mode_supported_p (scalar_float_mode mode)
     case E_TFmode:
       return true;
 
-      /* We only return true for KFmode if IEEE 128-bit types are supported, and
-	 if long double does not use the IEEE 128-bit format.  If long double
-	 uses the IEEE 128-bit format, it will use TFmode and not KFmode.
-	 Because the code will not use KFmode in that case, there will be aborts
-	 because it can't find KFmode in the Floatn types.  */
+      /* We only return true for KFmode if IEEE 128-bit types are
+	 supported.  */
     case E_KFmode:
-      return TARGET_FLOAT128_TYPE && !TARGET_IEEEQUAD;
+      return TARGET_FLOAT128_TYPE;
 
     default:
       return false;
@@ -23935,7 +23940,7 @@  rs6000_floatn_mode (int n, bool extended)
 
 	case 64:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
@@ -23959,7 +23964,7 @@  rs6000_floatn_mode (int n, bool extended)
 
 	case 128:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw12.c b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
new file mode 100644
index 00000000000..d08b4cbc883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
@@ -0,0 +1,137 @@ 
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with _Float128.  This is the same as float128-hw4.c, except the type
+   _Float128 is used, and the IEEE 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw13.c b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
new file mode 100644
index 00000000000..51a3cd4802b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
@@ -0,0 +1,137 @@ 
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with __float128.  This is the same as float128-hw4.c, except the type
+   __float128 is used, and the IBM 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE __float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
index fc149169bc6..3f6717825b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
@@ -118,6 +118,11 @@  nfms_odd (TYPE a, TYPE b, TYPE c)
   return -__builtin_fmaf128_round_to_odd (a, b, -c);
 }
 
+/* In using long double instead of _Float128, we might not be able to optimize
+   __builtin_fmaf128_round_to_odd (a, b, -c) into using xsmsubqpo instead of
+   xsnegqp and xsmaddqpo due to conversions between TFmode and KFmode.  So just
+   recognize that the did the FMA optimization.  */
+
 /* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
@@ -126,11 +131,8 @@  nfms_odd (TYPE a, TYPE b, TYPE c)
 /* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
-/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsn?m(add|sub)qpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
-/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
 /* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
 /* { dg-final { scan-assembler-not {\mbl\M}         } } */