[to-be-committed,PR,target/125387,RISC-V] Properly cost mulh for RISC-V

Message ID 98e13ce3-62e8-4123-b104-add995579105@oss.qualcomm.com
State Superseded
Headers
Series [to-be-committed,PR,target/125387,RISC-V] Properly cost mulh for RISC-V |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-lint warning Lint failed
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test fail Testing failed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap fail Patch failed to apply

Commit Message

Jeffrey Law June 3, 2026, 10:20 p.m. UTC
  This is a patch Raphael wrote a few years back, but which never got 
upstreamed.  I can vaguely recall evaluating the patch and concluding to 
drop it both internally and from the queue of things to upstream.  But 
the details of how I reached that conclusion are lost.

The core problem is we didn't recognize the fairly complex widening 
multiply RTL, so it got costed by recursion which is painful on its 
own.  To add insult to injury the input operands are TI mode which, 
IIRC, also increased the cost of the mulh insn.

The net was a totally bogus cost, it was totally off the charts which in 
turn discouraged using multiplies for division-by-constant operations as 
can be seen in the BZ.

This patch from Raphael fixes recognition of the mulh RTL and uses the 
integer multiply cost out of the costing structure.  This results in 
sensible costs for the multiply and using a multiply-by-reciprocal 
instead of division in more cases.

It's been tested on riscv32-elf and riscv64-elf without regressions.  I 
haven't bootstrapped it as I had my system disabled yesterday for 
"reasons".  I'll obviously wait for the pre-commit verdict.

Jeff
PR target/125387
gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Recognize and properly cost
	mulh instructions.

gcc/testsuite
	* gcc.target/riscv/pr125387.c: New test.
  

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a2730a90ca8b..99e1e4b3d8d2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4677,8 +4677,26 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	  return true;
 	}
       gcc_fallthrough ();
-    case ASHIFTRT:
     case LSHIFTRT:
+      /* mulh[u] pattern */
+      if (GET_CODE (x) == LSHIFTRT
+	  && TARGET_MUL
+	  && mode == GET_MODE_2XWIDER_MODE (word_mode).require ()
+	  && outer_code == TRUNCATE
+	  && (GET_CODE (XEXP (x, 0)) == MULT)
+	     && (GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND
+		 || GET_CODE (XEXP (XEXP (x, 0), 0)) == SIGN_EXTEND)
+	     && (GET_CODE (XEXP (XEXP (x, 0), 1)) == ZERO_EXTEND
+		 || GET_CODE (XEXP (XEXP (x, 0), 1)) == SIGN_EXTEND)
+	  && (GET_CODE (XEXP (x, 1)) == CONST_INT
+	      && INTVAL (XEXP (x, 1)) == BITS_PER_WORD))
+	{
+	  *total = COSTS_N_INSNS (tune_param->int_mul[mode == TImode]);
+	  return true;
+	}
+	/* Fall through.  */
+
+    case ASHIFTRT:
       *total = riscv_binary_cost (x, SINGLE_SHIFT_COST,
 				  CONSTANT_P (XEXP (x, 1)) ? 4 : 9);
       return false;
diff --git a/gcc/testsuite/gcc.target/riscv/pr125387.c b/gcc/testsuite/gcc.target/riscv/pr125387.c
new file mode 100644
index 000000000000..c381b7a28e21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr125387.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile { target rv64} } */
+/* { dg-additional-options "-mtune=tt-ascalon-d8 -march=rv64gc" } 
+/* { dg-skip-if "" { *-*-* } { "-Os" "-Oz" } } */
+
+long foo1 (long x) { return x / 10000L; }
+long foo2 (long x) { return x / 30; }
+
+/* { dg-final { scan-assembler-times "mulh" 2 } } */
+/* { dg-final { scan-assembler-not "\tdiv\t" } } */