[2/3,V3] PR 107299, Make __float128 use the _Float128 type

Message ID Y5ozVm5L+crILT33@toto.the-meissners.org
State New
Headers
Series [1/3,V3] PR 107299, Rework 128-bit complex multiply and divide |

Commit Message

Michael Meissner Dec. 14, 2022, 8:34 p.m. UTC
  This patch fixes the issue that GCC cannot build when the default long double
is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
during the evrp pass.  Ultimately it is failing because the code declared the
internal type for one IEEE 128-bit floating point type, and NaN functions use a
different IEEE 128-bit floating point type.

Gimple-range-fold uses the internal types, but there are similar problems when
the code is converted to RTL and the two different modes (KFmode, TFmode) are
used.

	typedef float TFtype __attribute__((mode (TF)));
	typedef __complex float TCtype __attribute__((mode (TC)));

	TCtype
	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
	{
	  TFtype ac, bd, ad, bc, x, y;
	  TCtype res;

	  ac = a * c;
	  bd = b * d;
	  ad = a * d;
	  bc = b * c;

	  x = ac - bd;
	  y = ad + bc;

	  if (__builtin_isnan (x) && __builtin_isnan (y))
	    {
	      _Bool recalc = 0;
	      if (__builtin_isinf (a) || __builtin_isinf (b))
		{

		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (__builtin_isinf (c) || __builtin_isinf (d))
		{

		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  recalc = 1;
		}
	      if (!recalc
		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
		{

		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (recalc)
		{
		  x = __builtin_inff128 () * (a * c - b * d);
		  y = __builtin_inff128 () * (a * d + b * c);
		}
	    }

	  __real__ res = x;
	  __imag__ res = y;
	  return res;
	}

Currently GCC uses the long double type node for __float128 if long double is
IEEE 128-bit.  It did not use the node for _Float128.

Originally this was noticed if you call the nansq function to make a signaling
NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
different from __float128, the machine independent code converts signaling NaNs
to quiet NaNs if the types are not compatible.  The following tests used to
fail when run on a system where long double is IEEE 128-bit:

	gcc.dg/torture/float128-nan.c
	gcc.target/powerpc/nan128-1.c

This patch makes both __float128 and _Float128 use the same type node.

One side effect of not using the long double type node for __float128 is that we
must only use KFmode for _Float128/__float128.  The libstdc++ library won't
build if we use TFmode for _Float128 and __float128 when long double is IEEE
128-bit.

Another minor side effect is that the f128 round to odd fused multiply-add
function will not merge negatition with the FMA operation when the type is long
double.  If the type is __float128 or _Float128, then it will continue to do the
optimization.  The round to odd functions are defined in terms of __float128
arguments.  For example:

	long double
	do_fms (long double a, long double b, long double c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate (assuming -mabi=ieeelongdouble):

	xsnegqp 4,4
	xsmaddqpo 4,2,3
	xxlor 34,36,36

while:

	__float128
	do_fms (__float128 a, __float128 b, __float128 c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate:

	xsmsubqpo 4,2,3
	xxlor 34,36,36

Assuming this patch goes in, we can open a bug about the above optimizations not
working.  However, given that the functions are explicitly documented to use
__float128 types, and the code in the test is using long double, I don't think
it is a high priority issue.  The user should use the documented types.

I did experiment to do the support, and to to it properly, you need a bunch of
insns used by the combiner to deal with combining the conversion from TFmode to
KFmode along with the optimization in order to eventually combine the multiple
and add/subtract operations into a separate FMA.

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-12-14   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
	_Float128 type for __float128.
	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
	built-in passes or returns TFmode.  If the predicate failed because the
	modes were different, use convert_move to load up the value instead of
	copy_to_mode_reg.
	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
	modes (KFmode or KCmode), even if long double is IEEE 128-bit.
	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
	time if we support IEEE 128-bit floating point.
	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/float128-hw12.c: New test.
	* gcc.target/powerpc/float128-hw13.c: Likewise.
	* gcc.target/powerpc/float128-hw4.c: Update insns.
---
 gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
 gcc/config/rs6000/rs6000.cc                   |  31 ++-
 .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw4.c         |  10 +-
 5 files changed, 431 insertions(+), 121 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c
  

Patch

diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
index 90ab39dc258..e5298f45363 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -730,25 +730,28 @@  rs6000_init_builtins (void)
 
   if (TARGET_FLOAT128_TYPE)
     {
-      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
-	ieee128_float_type_node = long_double_type_node;
-      else
+      /* In the past we used long_double_type_node when long double was IEEE
+	 128-bit.  However, this means that the _Float128 type
+	 (i.e. float128_type_node) is a different type from __float128
+	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
+	 such as processing signaling NaNs with the nansf128 built-in function
+	 (which returns a _Float128 value) and assign it to a long double or
+	 __float128 value.  The two explicit IEEE 128-bit types should always
+	 use the same internal type.
+
+	 For C we only need to register the __ieee128 name for it.  For C++, we
+	 create a distinct type which will mangle differently (u9__ieee128)
+	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
+      if (float128t_type_node == NULL_TREE)
 	{
-	  /* For C we only need to register the __ieee128 name for
-	     it.  For C++, we create a distinct type which will mangle
-	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
-	     backwards compatibly.  */
-	  if (float128t_type_node == NULL_TREE)
-	    {
-	      float128t_type_node = make_node (REAL_TYPE);
-	      TYPE_PRECISION (float128t_type_node)
-		= TYPE_PRECISION (float128_type_node);
-	      layout_type (float128t_type_node);
-	      SET_TYPE_MODE (float128t_type_node,
-			     TYPE_MODE (float128_type_node));
-	    }
-	  ieee128_float_type_node = float128t_type_node;
+	  float128t_type_node = make_node (REAL_TYPE);
+	  TYPE_PRECISION (float128t_type_node)
+	    = TYPE_PRECISION (float128_type_node);
+	  layout_type (float128t_type_node);
+	  SET_TYPE_MODE (float128t_type_node,
+			 TYPE_MODE (float128_type_node));
 	}
+      ieee128_float_type_node = float128t_type_node;
       t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
       lang_hooks.types.register_builtin_type (ieee128_float_type_node,
 					      "__ieee128");
@@ -3265,13 +3268,13 @@  htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
+   (and in mode RETURN_MODE if that's convenient).
    SUBTARGET may be used as the target for computing one of EXP's operands.
    IGNORE is nonzero if the value is to be ignored.
    Use the new builtin infrastructure.  */
 rtx
 rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
-		       machine_mode /* mode */, int ignore)
+		       machine_mode return_mode, int ignore)
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_gen_builtins fcode
@@ -3287,78 +3290,99 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
   size_t uns_fcode = (size_t)fcode;
   enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
 
-  /* TODO: The following commentary and code is inherited from the original
-     builtin processing code.  The commentary is a bit confusing, with the
-     intent being that KFmode is always IEEE-128, IFmode is always IBM
-     double-double, and TFmode is the current long double.  The code is
-     confusing in that it converts from KFmode to TFmode pattern names,
-     when the other direction is more intuitive.  Try to address this.  */
-
-  /* We have two different modes (KFmode, TFmode) that are the IEEE
-     128-bit floating point type, depending on whether long double is the
-     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
-     It is simpler if we only define one variant of the built-in function,
-     and switch the code when defining it, rather than defining two built-
-     ins and using the overload table in rs6000-c.cc to switch between the
-     two.  If we don't have the proper assembler, don't do this switch
-     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
-  if (FLOAT128_IEEE_P (TFmode))
-    switch (icode)
-      {
-      case CODE_FOR_sqrtkf2_odd:
-	icode = CODE_FOR_sqrttf2_odd;
-	break;
-      case CODE_FOR_trunckfdf2_odd:
-	icode = CODE_FOR_trunctfdf2_odd;
-	break;
-      case CODE_FOR_addkf3_odd:
-	icode = CODE_FOR_addtf3_odd;
-	break;
-      case CODE_FOR_subkf3_odd:
-	icode = CODE_FOR_subtf3_odd;
-	break;
-      case CODE_FOR_mulkf3_odd:
-	icode = CODE_FOR_multf3_odd;
-	break;
-      case CODE_FOR_divkf3_odd:
-	icode = CODE_FOR_divtf3_odd;
-	break;
-      case CODE_FOR_fmakf4_odd:
-	icode = CODE_FOR_fmatf4_odd;
-	break;
-      case CODE_FOR_xsxexpqp_kf:
-	icode = CODE_FOR_xsxexpqp_tf;
-	break;
-      case CODE_FOR_xsxsigqp_kf:
-	icode = CODE_FOR_xsxsigqp_tf;
-	break;
-      case CODE_FOR_xststdcnegqp_kf:
-	icode = CODE_FOR_xststdcnegqp_tf;
-	break;
-      case CODE_FOR_xsiexpqp_kf:
-	icode = CODE_FOR_xsiexpqp_tf;
-	break;
-      case CODE_FOR_xsiexpqpf_kf:
-	icode = CODE_FOR_xsiexpqpf_tf;
-	break;
-      case CODE_FOR_xststdcqp_kf:
-	icode = CODE_FOR_xststdcqp_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_eq_kf:
-	icode = CODE_FOR_xscmpexpqp_eq_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_lt_kf:
-	icode = CODE_FOR_xscmpexpqp_lt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_gt_kf:
-	icode = CODE_FOR_xscmpexpqp_gt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_unordered_kf:
-	icode = CODE_FOR_xscmpexpqp_unordered_tf;
-	break;
-      default:
-	break;
-      }
+  /* For 128-bit long double, we may need both the KFmode built-in functions
+     and IFmode built-in functions to the equivalent TFmode built-in function,
+     if either a TFmode result is expected or any of the arguments use
+     TFmode.  */
+  if (TARGET_LONG_DOUBLE_128)
+    {
+      bool uses_tf_mode = return_mode == TFmode;
+      if (!uses_tf_mode)
+	{
+	  call_expr_arg_iterator iter;
+	  tree arg;
+	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+	    {
+	      if (arg != error_mark_node
+		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
+		{
+		  uses_tf_mode = true;
+		  break;
+		}
+	    }
+	}
+
+      /* Convert KFmode built-in functions to TFmode when long double is IEEE
+	 128-bit.  */
+      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
+	switch (icode)
+	  {
+	  case CODE_FOR_sqrtkf2_odd:
+	    icode = CODE_FOR_sqrttf2_odd;
+	    break;
+	  case CODE_FOR_trunckfdf2_odd:
+	    icode = CODE_FOR_trunctfdf2_odd;
+	    break;
+	  case CODE_FOR_addkf3_odd:
+	    icode = CODE_FOR_addtf3_odd;
+	    break;
+	  case CODE_FOR_subkf3_odd:
+	    icode = CODE_FOR_subtf3_odd;
+	    break;
+	  case CODE_FOR_mulkf3_odd:
+	    icode = CODE_FOR_multf3_odd;
+	    break;
+	  case CODE_FOR_divkf3_odd:
+	    icode = CODE_FOR_divtf3_odd;
+	    break;
+	  case CODE_FOR_fmakf4_odd:
+	    icode = CODE_FOR_fmatf4_odd;
+	    break;
+	  case CODE_FOR_xsxexpqp_kf:
+	    icode = CODE_FOR_xsxexpqp_tf;
+	    break;
+	  case CODE_FOR_xsxsigqp_kf:
+	    icode = CODE_FOR_xsxsigqp_tf;
+	    break;
+	  case CODE_FOR_xststdcnegqp_kf:
+	    icode = CODE_FOR_xststdcnegqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqp_kf:
+	    icode = CODE_FOR_xsiexpqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqpf_kf:
+	    icode = CODE_FOR_xsiexpqpf_tf;
+	    break;
+	  case CODE_FOR_xststdcqp_kf:
+	    icode = CODE_FOR_xststdcqp_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_eq_kf:
+	    icode = CODE_FOR_xscmpexpqp_eq_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_lt_kf:
+	    icode = CODE_FOR_xscmpexpqp_lt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_gt_kf:
+	    icode = CODE_FOR_xscmpexpqp_gt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_unordered_kf:
+	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
+	    break;
+	  default:
+	    break;
+	  }
+
+      /* Convert IFmode built-in functions to TFmode when long double is IBM
+	 128-bit.  */
+      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
+	{
+	  if (icode == CODE_FOR_packif)
+	    icode = CODE_FOR_packtf;
+
+	  else if (icode == CODE_FOR_unpackif)
+	    icode = CODE_FOR_unpacktf;
+	}
+    }
 
   /* In case of "#pragma target" changes, we initialize all builtins
      but check for actual availability now, during expand time.  For
@@ -3481,18 +3505,6 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
     {
-      if (fcode == RS6000_BIF_PACK_IF)
-	{
-	  icode = CODE_FOR_packtf;
-	  fcode = RS6000_BIF_PACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
-      else if (fcode == RS6000_BIF_UNPACK_IF)
-	{
-	  icode = CODE_FOR_unpacktf;
-	  fcode = RS6000_BIF_UNPACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
     }
 
   /* TRUE iff the built-in function returns void.  */
@@ -3647,7 +3659,24 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   for (int i = 0; i < nargs; i++)
     if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
-      op[i] = copy_to_mode_reg (mode[i+k], op[i]);
+      {
+	/* If the predicate failed because the modes are different, do a
+	   convert instead of copy_to_mode_reg, since copy_to_mode_reg will
+	   abort in this case.  The modes might be different if we have two
+	   different 128-bit floating point modes (i.e. KFmode/TFmode if long
+	   double is IEEE 128-bit and IFmode/TFmode if long double is IBM
+	   128-bit).  */
+	machine_mode mode_insn = mode[i+k];
+	machine_mode mode_op = GET_MODE (op[i]);
+	if (mode_insn != mode_op && mode_op != VOIDmode)
+	  {
+	    rtx tmp = gen_reg_rtx (mode_insn);
+	    convert_move (tmp, op[i], 0);
+	    op[i] = tmp;
+	  }
+	else
+	  op[i] = copy_to_mode_reg (mode_insn, op[i]);
+      }
 
   rtx pat;
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6d08f6ed1fb..604f6a9ce33 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23819,15 +23819,23 @@  rs6000_eh_return_filter_mode (void)
   return TARGET_32BIT ? SImode : word_mode;
 }
 
-/* Target hook for translate_mode_attribute.  */
+/* Target hook for translate_mode_attribute.
+
+   When -mabi=ieeelongdouble is used, we want to translate either KFmode or
+   TFmode to KFmode.  This is because user code that wants to specify IEEE
+   128-bit types will use either TFmode or KFmode, and we really want to use
+   the _Float128 and __float128 types instead of long double.
+
+   Similarly when -mabi=ibmlongdouble is used, we want to map IFmode into
+   TFmode.  */
 static machine_mode
 rs6000_translate_mode_attribute (machine_mode mode)
 {
-  if ((FLOAT128_IEEE_P (mode)
-       && ieee128_float_type_node == long_double_type_node)
-      || (FLOAT128_IBM_P (mode)
-	  && ibm128_float_type_node == long_double_type_node))
+  if (FLOAT128_IBM_P (mode)
+      && ibm128_float_type_node == long_double_type_node)
     return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
+  else if (FLOAT128_IEEE_P (mode))
+    return COMPLEX_MODE_P (mode) ? E_KCmode : E_KFmode;
   return mode;
 }
 
@@ -23863,13 +23871,10 @@  rs6000_libgcc_floating_mode_supported_p (scalar_float_mode mode)
     case E_TFmode:
       return true;
 
-      /* We only return true for KFmode if IEEE 128-bit types are supported, and
-	 if long double does not use the IEEE 128-bit format.  If long double
-	 uses the IEEE 128-bit format, it will use TFmode and not KFmode.
-	 Because the code will not use KFmode in that case, there will be aborts
-	 because it can't find KFmode in the Floatn types.  */
+      /* We only return true for KFmode if IEEE 128-bit types are
+	 supported.  */
     case E_KFmode:
-      return TARGET_FLOAT128_TYPE && !TARGET_IEEEQUAD;
+      return TARGET_FLOAT128_TYPE;
 
     default:
       return false;
@@ -23903,7 +23908,7 @@  rs6000_floatn_mode (int n, bool extended)
 
 	case 64:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
@@ -23927,7 +23932,7 @@  rs6000_floatn_mode (int n, bool extended)
 
 	case 128:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw12.c b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
new file mode 100644
index 00000000000..d08b4cbc883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
@@ -0,0 +1,137 @@ 
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with _Float128.  This is the same as float128-hw4.c, except the type
+   _Float128 is used, and the IEEE 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw13.c b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
new file mode 100644
index 00000000000..51a3cd4802b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
@@ -0,0 +1,137 @@ 
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with __float128.  This is the same as float128-hw4.c, except the type
+   __float128 is used, and the IBM 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE __float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
index fc149169bc6..3f6717825b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
@@ -118,6 +118,11 @@  nfms_odd (TYPE a, TYPE b, TYPE c)
   return -__builtin_fmaf128_round_to_odd (a, b, -c);
 }
 
+/* In using long double instead of _Float128, we might not be able to optimize
+   __builtin_fmaf128_round_to_odd (a, b, -c) into using xsmsubqpo instead of
+   xsnegqp and xsmaddqpo due to conversions between TFmode and KFmode.  So just
+   recognize that the did the FMA optimization.  */
+
 /* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
@@ -126,11 +131,8 @@  nfms_odd (TYPE a, TYPE b, TYPE c)
 /* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
-/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsn?m(add|sub)qpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
-/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
 /* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
 /* { dg-final { scan-assembler-not {\mbl\M}         } } */