Add widening expansion of MULT_HIGHPART_EXPR for integral modes

Message ID 22268514.EfDdHjke4D@fomalhaut
State New
Headers
Series Add widening expansion of MULT_HIGHPART_EXPR for integral modes |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed

Commit Message

Eric Botcazou April 29, 2024, 4:11 p.m. UTC
  Hi,

for integral modes, the expansion of MULT_HIGHPART_EXPR requires the presence 
of an {s,u}mul_highpart optab whereas, for vector modes, widening expansion is
supported.  This adds a widening expansion for integral modes too, which is in 
fact already implemented in expmed_mult_highpart_optab.  We'll use that in a 
subsequent change to the Ada front-end to generate fast modulo reduction for 
modular types with nonbinary modulus (a little controversial Ada 95 feature).

Tested on x86-64/Linux, OK for the mainline?

2024-04-29  Eric Botcazou  <ebotcazou@adacore.com>

	* expmed.h (expmed_mult_highpart_optab): Declare.
	* expmed.cc (expmed_mult_highpart_optab): Remove static keyword.
	Do not assume that OP1 is a constant integer.  Fix pasto.
	(expmed_mult_highpart): Pass OP1 narrowed to MODE in all the calls
	to expmed_mult_highpart_optab.
	* optabs-query.cc (can_mult_highpart_p): Use 2 for integer widening
	and shift subsequent values accordingly.
	* optabs.cc (expand_mult_highpart): Call expmed_mult_highpart_optab
	when can_mult_highpart_p returns 2 and adjust to above change.
  

Comments

Li, Pan2 May 19, 2024, 1:32 a.m. UTC | #1
Hi Botcazou,

Just notice that this patch may result in some ICE when build libc++ for the riscv port, details as below.
Please note not all configuration can reproduce this issue, feel free to ping me if you cannot reproduce this issue. CC more riscv port people for awareness.

during GIMPLE pass: slp
In file included from /home/box/panli/gnu-toolchain/gcc/libstdc++-v3/src/c++17/floating_to_chars.cc:124:
/home/box/panli/gnu-toolchain/gcc/libstdc++-v3/src/c++17/ryu/generic_128.c: In function 'int {anonymous}::ryu::generic128::generic_to_chars(floating_decimal_128, char*)':
/home/box/panli/gnu-toolchain/gcc/libstdc++-v3/src/c++17/ryu/generic_128.c:251:5: internal compiler error: in require, at machmode.h:313
  251 | int generic_to_chars(const struct floating_decimal_128 v, char* const result) {
      |     ^~~~~~~~~~~~~~~~
0x30d526b diagnostic_impl(rich_location*, diagnostic_metadata const*, int, char const*, __va_list_tag (*) [1], diagnostic_t)
        ???:0
0x30d637e internal_error(char const*, ...)
        ???:0
0xdd9c9d fancy_abort(char const*, int, char const*)
        ???:0
0xbb4f9f can_mult_highpart_p(machine_mode, bool) [clone .cold]
        ???:0
0x17384b3 default_preferred_div_as_shifts_over_mult(tree_node const*)
        ???:0
0x2b34967 vect_recog_divmod_pattern(vec_info*, _stmt_vec_info*, tree_node**)
        ???:0
0x2b2f106 vect_pattern_recog_1(vec_info*, vect_recog_func*, _stmt_vec_info*)
        ???:0
0x2b2f3b1 vect_pattern_recog(vec_info*)
        ???:0
0x1a5e403 vect_slp_region(vec<basic_block_def*, va_heap, vl_ptr>, vec<data_reference*, va_heap, vl_ptr>, vec<int, va_heap, vl_ptr>*, unsigned int, loop*)
        ???:0
0x1a60505 vect_slp_bbs(vec<basic_block_def*, va_heap, vl_ptr> const&, loop*)
        ???:0
0x1a608f3 vect_slp_function(function*)
        ???:0
0x1a6c2a1 (anonymous namespace)::pass_slp_vectorize::execute(function*)
        ???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.

Pan

-----Original Message-----
From: Eric Botcazou <botcazou@adacore.com> 
Sent: Tuesday, April 30, 2024 12:11 AM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Add widening expansion of MULT_HIGHPART_EXPR for integral modes

Hi,

for integral modes, the expansion of MULT_HIGHPART_EXPR requires the presence 
of an {s,u}mul_highpart optab whereas, for vector modes, widening expansion is
supported.  This adds a widening expansion for integral modes too, which is in 
fact already implemented in expmed_mult_highpart_optab.  We'll use that in a 
subsequent change to the Ada front-end to generate fast modulo reduction for 
modular types with nonbinary modulus (a little controversial Ada 95 feature).

Tested on x86-64/Linux, OK for the mainline?

2024-04-29  Eric Botcazou  <ebotcazou@adacore.com>

	* expmed.h (expmed_mult_highpart_optab): Declare.
	* expmed.cc (expmed_mult_highpart_optab): Remove static keyword.
	Do not assume that OP1 is a constant integer.  Fix pasto.
	(expmed_mult_highpart): Pass OP1 narrowed to MODE in all the calls
	to expmed_mult_highpart_optab.
	* optabs-query.cc (can_mult_highpart_p): Use 2 for integer widening
	and shift subsequent values accordingly.
	* optabs.cc (expand_mult_highpart): Call expmed_mult_highpart_optab
	when can_mult_highpart_p returns 2 and adjust to above change.
  

Patch

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 60f65c7acc5..ccc671e922d 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2742,8 +2742,7 @@  static rtx expand_mult_const (machine_mode, rtx, HOST_WIDE_INT, rtx,
 static unsigned HOST_WIDE_INT invert_mod2n (unsigned HOST_WIDE_INT, int);
 static rtx extract_high_half (scalar_int_mode, rtx);
 static rtx expmed_mult_highpart (scalar_int_mode, rtx, rtx, rtx, int, int);
-static rtx expmed_mult_highpart_optab (scalar_int_mode, rtx, rtx, rtx,
-				       int, int);
+
 /* Compute and return the best algorithm for multiplying by T.
    The algorithm must cost less than cost_limit
    If retval.cost >= COST_LIMIT, no algorithm was found and all
@@ -3850,30 +3849,25 @@  extract_high_half (scalar_int_mode mode, rtx op)
   return convert_modes (mode, wider_mode, op, 0);
 }
 
-/* Like expmed_mult_highpart, but only consider using a multiplication
-   optab.  OP1 is an rtx for the constant operand.  */
+/* Like expmed_mult_highpart, but only consider using multiplication optab.  */
 
-static rtx
+rtx
 expmed_mult_highpart_optab (scalar_int_mode mode, rtx op0, rtx op1,
 			    rtx target, int unsignedp, int max_cost)
 {
-  rtx narrow_op1 = gen_int_mode (INTVAL (op1), mode);
+  const scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require ();
+  const bool speed = optimize_insn_for_speed_p ();
+  const int size = GET_MODE_BITSIZE (mode);
   optab moptab;
   rtx tem;
-  int size;
-  bool speed = optimize_insn_for_speed_p ();
-
-  scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require ();
-
-  size = GET_MODE_BITSIZE (mode);
 
   /* Firstly, try using a multiplication insn that only generates the needed
      high part of the product, and in the sign flavor of unsignedp.  */
   if (mul_highpart_cost (speed, mode) < max_cost)
     {
       moptab = unsignedp ? umul_highpart_optab : smul_highpart_optab;
-      tem = expand_binop (mode, moptab, op0, narrow_op1, target,
-			  unsignedp, OPTAB_DIRECT);
+      tem = expand_binop (mode, moptab, op0, op1, target, unsignedp,
+			  OPTAB_DIRECT);
       if (tem)
 	return tem;
     }
@@ -3886,12 +3880,12 @@  expmed_mult_highpart_optab (scalar_int_mode mode, rtx op0, rtx op1,
 	  + 4 * add_cost (speed, mode) < max_cost))
     {
       moptab = unsignedp ? smul_highpart_optab : umul_highpart_optab;
-      tem = expand_binop (mode, moptab, op0, narrow_op1, target,
-			  unsignedp, OPTAB_DIRECT);
+      tem = expand_binop (mode, moptab, op0, op1, target, !unsignedp,
+			  OPTAB_DIRECT);
       if (tem)
 	/* We used the wrong signedness.  Adjust the result.  */
-	return expand_mult_highpart_adjust (mode, tem, op0, narrow_op1,
-					    tem, unsignedp);
+	return expand_mult_highpart_adjust (mode, tem, op0, op1, tem,
+					    unsignedp);
     }
 
   /* Try widening multiplication.  */
@@ -3899,8 +3893,8 @@  expmed_mult_highpart_optab (scalar_int_mode mode, rtx op0, rtx op1,
   if (convert_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
       && mul_widen_cost (speed, wider_mode) < max_cost)
     {
-      tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0,
-			  unsignedp, OPTAB_WIDEN);
+      tem = expand_binop (wider_mode, moptab, op0, op1, NULL_RTX, unsignedp,
+			  OPTAB_WIDEN);
       if (tem)
 	return extract_high_half (mode, tem);
     }
@@ -3941,14 +3935,14 @@  expmed_mult_highpart_optab (scalar_int_mode mode, rtx op0, rtx op1,
 	  + 2 * shift_cost (speed, mode, size-1)
 	  + 4 * add_cost (speed, mode) < max_cost))
     {
-      tem = expand_binop (wider_mode, moptab, op0, narrow_op1,
-			  NULL_RTX, ! unsignedp, OPTAB_WIDEN);
+      tem = expand_binop (wider_mode, moptab, op0, op1, NULL_RTX, !unsignedp,
+			  OPTAB_WIDEN);
       if (tem != 0)
 	{
 	  tem = extract_high_half (mode, tem);
 	  /* We used the wrong signedness.  Adjust the result.  */
-	  return expand_mult_highpart_adjust (mode, tem, op0, narrow_op1,
-					      target, unsignedp);
+	  return expand_mult_highpart_adjust (mode, tem, op0, op1, target,
+					      unsignedp);
 	}
     }
 
@@ -3970,18 +3964,19 @@  static rtx
 expmed_mult_highpart (scalar_int_mode mode, rtx op0, rtx op1,
 		      rtx target, int unsignedp, int max_cost)
 {
+  const bool speed = optimize_insn_for_speed_p ();
   unsigned HOST_WIDE_INT cnst1;
   int extra_cost;
   bool sign_adjust = false;
   enum mult_variant variant;
   struct algorithm alg;
-  rtx tem;
-  bool speed = optimize_insn_for_speed_p ();
+  rtx narrow_op1, tem;
 
   /* We can't support modes wider than HOST_BITS_PER_INT.  */
   gcc_assert (HWI_COMPUTABLE_MODE_P (mode));
 
   cnst1 = INTVAL (op1) & GET_MODE_MASK (mode);
+  narrow_op1 = gen_int_mode (INTVAL (op1), mode);
 
   /* We can't optimize modes wider than BITS_PER_WORD.
      ??? We might be able to perform double-word arithmetic if
@@ -3989,7 +3984,7 @@  expmed_mult_highpart (scalar_int_mode mode, rtx op0, rtx op1,
      synth_mult etc. assume single-word operations.  */
   scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require ();
   if (GET_MODE_BITSIZE (wider_mode) > BITS_PER_WORD)
-    return expmed_mult_highpart_optab (mode, op0, op1, target,
+    return expmed_mult_highpart_optab (mode, op0, narrow_op1, target,
 				       unsignedp, max_cost);
 
   extra_cost = shift_cost (speed, mode, GET_MODE_BITSIZE (mode) - 1);
@@ -4007,7 +4002,8 @@  expmed_mult_highpart (scalar_int_mode mode, rtx op0, rtx op1,
     {
       /* See whether the specialized multiplication optabs are
 	 cheaper than the shift/add version.  */
-      tem = expmed_mult_highpart_optab (mode, op0, op1, target, unsignedp,
+      tem = expmed_mult_highpart_optab (mode, op0, narrow_op1, target,
+					unsignedp,
 					alg.cost.cost + extra_cost);
       if (tem)
 	return tem;
@@ -4022,7 +4018,7 @@  expmed_mult_highpart (scalar_int_mode mode, rtx op0, rtx op1,
 
       return tem;
     }
-  return expmed_mult_highpart_optab (mode, op0, op1, target,
+  return expmed_mult_highpart_optab (mode, op0, narrow_op1, target,
 				     unsignedp, max_cost);
 }
 
diff --git a/gcc/expmed.h b/gcc/expmed.h
index f5375c84f25..0a19176b77a 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -724,5 +724,7 @@  extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
 extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int, bool = false);
 extern rtx expand_mult_highpart_adjust (scalar_int_mode, rtx, rtx, rtx,
 					rtx, int);
+extern rtx expmed_mult_highpart_optab (scalar_int_mode, rtx, rtx, rtx,
+				       int, int);
 
 #endif  // EXPMED_H
diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index e36a1506c79..de145be7075 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -502,19 +502,35 @@  find_widening_optab_handler_and_mode (optab op, machine_mode to_mode,
   return CODE_FOR_nothing;
 }
 
-/* Return non-zero if a highpart multiply is supported of can be synthisized.
+/* Return non-zero if a highpart multiply is supported or can be synthesized.
    For the benefit of expand_mult_highpart, the return value is 1 for direct,
-   2 for even/odd widening, and 3 for hi/lo widening.  */
+   2 for integral widening, 3 for even/odd widening, 4 for hi/lo widening.  */
 
 int
 can_mult_highpart_p (machine_mode mode, bool uns_p)
 {
   optab op;
+  scalar_int_mode int_mode;
 
   op = uns_p ? umul_highpart_optab : smul_highpart_optab;
   if (optab_handler (op, mode) != CODE_FOR_nothing)
     return 1;
 
+  /* If the mode is integral, synth from widening or larger operations.  */
+  if (is_a <scalar_int_mode> (mode, &int_mode))
+    {
+      scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (int_mode).require ();
+
+      op = uns_p ? umul_widen_optab : smul_widen_optab;
+      if (convert_optab_handler (op, wider_mode, mode) != CODE_FOR_nothing)
+	return 2;
+
+      /* The test on the size comes from expmed_mult_highpart_optab.  */
+      if (optab_handler (smul_optab, wider_mode) != CODE_FOR_nothing
+	  && GET_MODE_BITSIZE (int_mode) - 1 < BITS_PER_WORD)
+	return 2;
+    }
+
   /* If the mode is an integral vector, synth from widening operations.  */
   if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
     return 0;
@@ -535,7 +551,7 @@  can_mult_highpart_p (machine_mode mode, bool uns_p)
 			    + ((i & 1) ? nunits : 0));
 	  vec_perm_indices indices (sel, 2, nunits);
 	  if (can_vec_perm_const_p (mode, mode, indices))
-	    return 2;
+	    return 3;
 	}
     }
 
@@ -551,7 +567,7 @@  can_mult_highpart_p (machine_mode mode, bool uns_p)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
 	  vec_perm_indices indices (sel, 2, nunits);
 	  if (can_vec_perm_const_p (mode, mode, indices))
-	    return 3;
+	    return 4;
 	}
     }
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..e7913884567 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -6751,10 +6751,13 @@  expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
       return expand_binop (mode, tab1, op0, op1, target, uns_p,
 			   OPTAB_LIB_WIDEN);
     case 2:
+      return expmed_mult_highpart_optab (as_a <scalar_int_mode> (mode),
+					 op0, op1, target, uns_p, INT_MAX);
+    case 3:
       tab1 = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab;
       tab2 = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       break;
-    case 3:
+    case 4:
       tab1 = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       tab2 = uns_p ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
       if (BYTES_BIG_ENDIAN)
@@ -6783,7 +6786,7 @@  expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
   m2 = gen_lowpart (mode, eops[0].value);
 
   vec_perm_builder sel;
-  if (method == 2)
+  if (method == 3)
     {
       /* The encoding has 2 interleaved stepped patterns.  */
       sel.new_vector (GET_MODE_NUNITS (mode), 2, 3);