[3/3] RISC-V: rv32/DF: Prevent 2 SImode loads using XTheadMemIdx

Message ID 20240807062714.2989672-3-christoph.muellner@vrull.eu
State Committed
Commit 33aca37ebc0c06e9c9240a8a0c13e31a0bcd4efb
Delegated to: Jeff Law
Headers
Series [1/3] RISC-V: testsuite: xtheadfmemidx: Rename test and add similar Zfa test |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
rivoscibot/toolchain-ci-rivos-lint success Lint passed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gc-lp64d-non-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc-lp64d-non-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test success Testing passed

Commit Message

Christoph Müllner Aug. 7, 2024, 6:27 a.m. UTC
  When enabling XTheadFmv/Zfa and XThead(F)MemIdx, we might end up
with the following insn (registers are examples, but of correct class):

(set (reg:DF a4)
     (mem:DF (plus:SI (mult:SI (reg:SI a0)
			       (const_int 8))
		      (reg:SI a5))))

This is a result of an attempt to load the DF register via two SI
register loads followed by XTheadFmv/Zfa instructions to move the
contents of the two SI registers into the DF register.

The two loads are generated in riscv_split_doubleword_move(),
where the second load adds an offset of 4 to load address.
While this works fine for RVI loads, this can't be handled
for XTheadMemIdx addresses.  Coming back to the example above,
we would end up with the following insn, which can't be simplified
or matched:

(set (reg:SI a4)
     (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI a0)
					(const_int 8))
			       (reg:SI a5))
		      (const_int 4))))

This triggered an ICE in the past, which was resolved in b79cd204c780,
which also added the test xtheadfmemidx-medany.c, where the examples
are from.  The patch postponed the optimization insn_and_split pattern
for XThead(F)MemIdx, so that the situation could effectively be avoided.

Since we don't want to rely on these optimization pattern in the future,
we need a different solution.  Therefore, this patch restricts the
movdf_hardfloat_rv32 insn to not match for split-double-word-moves
with XThead(F)MemIdx operands.  This ensures we don't need to split
them up later.

When looking at the code generation of the test file, we can see that
we have less GP<->FP conversions, but cannot use the indexed loads.
The new sequence is identical to rv32gc_xtheadfmv (similar to rv32gc_zfa).

Old:
[...]
	lla     a5,.LANCHOR0
	th.flrd fa5,a5,a0,3
	fmv.x.w a4,fa5
	th.fmv.x.hw     a5,fa5
.L1:
	fmv.w.x fa0,a4
	th.fmv.hw.x     fa0,a5
	ret
[...]

New:
[...]
	lla     a5,.LANCHOR0
	slli    a4,a0,3
	add     a4,a4,a5
	lw      a5,4(a4)
	lw      a4,0(a4)
.L1:
	fmv.w.x fa0,a4
	th.fmv.hw.x     fa0,a5
	ret
[...]

This was tested (together with the patch that eliminates the
XTheadMemIdx optimization patterns) with SPEC CPU 2017 intrate
on QEMU (RV64/lp64d).

gcc/ChangeLog:

	* config/riscv/constraints.md (th_m_noi): New constraint.
	* config/riscv/riscv.md: Adjust movdf_hardfloat_rv32 for
	XTheadMemIdx.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: Adjust.
	* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/constraints.md                          | 9 +++++++++
 gcc/config/riscv/riscv.md                                | 4 ++--
 .../gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c    | 3 ++-
 .../gcc.target/riscv/xtheadfmemidx-zfa-medany.c          | 3 ++-
 4 files changed, 15 insertions(+), 4 deletions(-)
  

Comments

Jeff Law Aug. 7, 2024, 3:13 p.m. UTC | #1
On 8/7/24 12:27 AM, Christoph Müllner wrote:
> When enabling XTheadFmv/Zfa and XThead(F)MemIdx, we might end up
> with the following insn (registers are examples, but of correct class):
> 
> (set (reg:DF a4)
>       (mem:DF (plus:SI (mult:SI (reg:SI a0)
> 			       (const_int 8))
> 		      (reg:SI a5))))
> 
> This is a result of an attempt to load the DF register via two SI
> register loads followed by XTheadFmv/Zfa instructions to move the
> contents of the two SI registers into the DF register.
> 
> The two loads are generated in riscv_split_doubleword_move(),
> where the second load adds an offset of 4 to load address.
> While this works fine for RVI loads, this can't be handled
> for XTheadMemIdx addresses.  Coming back to the example above,
> we would end up with the following insn, which can't be simplified
> or matched:
> 
> (set (reg:SI a4)
>       (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI a0)
> 					(const_int 8))
> 			       (reg:SI a5))
> 		      (const_int 4))))
> 
> This triggered an ICE in the past, which was resolved in b79cd204c780,
> which also added the test xtheadfmemidx-medany.c, where the examples
> are from.  The patch postponed the optimization insn_and_split pattern
> for XThead(F)MemIdx, so that the situation could effectively be avoided.
> 
> Since we don't want to rely on these optimization pattern in the future,
> we need a different solution.  Therefore, this patch restricts the
> movdf_hardfloat_rv32 insn to not match for split-double-word-moves
> with XThead(F)MemIdx operands.  This ensures we don't need to split
> them up later.
> 
> When looking at the code generation of the test file, we can see that
> we have less GP<->FP conversions, but cannot use the indexed loads.
> The new sequence is identical to rv32gc_xtheadfmv (similar to rv32gc_zfa).
> 
> Old:
> [...]
> 	lla     a5,.LANCHOR0
> 	th.flrd fa5,a5,a0,3
> 	fmv.x.w a4,fa5
> 	th.fmv.x.hw     a5,fa5
> .L1:
> 	fmv.w.x fa0,a4
> 	th.fmv.hw.x     fa0,a5
> 	ret
> [...]
> 
> New:
> [...]
> 	lla     a5,.LANCHOR0
> 	slli    a4,a0,3
> 	add     a4,a4,a5
> 	lw      a5,4(a4)
> 	lw      a4,0(a4)
> .L1:
> 	fmv.w.x fa0,a4
> 	th.fmv.hw.x     fa0,a5
> 	ret
> [...]
> 
> This was tested (together with the patch that eliminates the
> XTheadMemIdx optimization patterns) with SPEC CPU 2017 intrate
> on QEMU (RV64/lp64d).
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/constraints.md (th_m_noi): New constraint.
> 	* config/riscv/riscv.md: Adjust movdf_hardfloat_rv32 for
> 	XTheadMemIdx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: Adjust.
> 	* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: Likewise.
OK

Note I think there's offsettable_address or something like that which 
can be used to help with these scenarios if you wanted to try and 
improve things.  I don't think the problem you've run into is inherently 
specific to risc-v.  Other targets split double sized moves and 
sometimes need to restrict the kinds of memory addresses matched.

jeff
  

Patch

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index a9ee346af6f0..3ab6d5426223 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -243,6 +243,15 @@  (define_memory_constraint "th_m_miu"
   (and (match_code "mem")
        (match_test "th_memidx_legitimate_index_p (op, true)")))
 
+(define_memory_constraint "th_m_noi"
+  "@internal
+   A MEM with does not match XTheadMemIdx operands."
+  (and (match_code "mem")
+       (and (match_test "!th_memidx_legitimate_modify_p (op, true)")
+	    (and (match_test "!th_memidx_legitimate_modify_p (op, false)")
+		 (and (match_test "!th_memidx_legitimate_index_p (op, false)")
+		      (match_test "!th_memidx_legitimate_index_p (op, true)"))))))
+
 ;; CORE-V Constraints
 (define_constraint "CV_alu_pow2"
   "@internal
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d9f6c1765d0a..f46851dd4717 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2592,8 +2592,8 @@  (define_expand "movdf"
 ;; In RV32, we lack fmv.x.d and fmv.d.x.  Go through memory instead.
 ;; (However, we can still use fcvt.d.w to zero a floating-point register.)
 (define_insn "*movdf_hardfloat_rv32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,   f,f,f,m,m,*zmvf,*zmvr,  *r,*r,*m")
-	(match_operand:DF 1 "move_operand"         " f,zfli,G,m,f,G,*zmvr,*zmvf,*r*G,*m,*r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,   f,f,f,m,m,*zmvf,*zmvr,  *r,*r,*th_m_noi")
+	(match_operand:DF 1 "move_operand"         " f,zfli,G,m,f,G,*zmvr,*zmvf,*r*G,*th_m_noi,*r"))]
   "!TARGET_64BIT && TARGET_DOUBLE_FLOAT
    && (register_operand (operands[0], DFmode)
        || reg_or_0_operand (operands[1], DFmode))"
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
index 7c70b7758246..6746c3140578 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c
@@ -35,5 +35,6 @@  double foo (int i, int j)
   return z;
 }
 
-/* { dg-final { scan-assembler-times {\mth\.flrd\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mth\.flrd\M} } } */
+/* { dg-final { scan-assembler-times {\mlw\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mth\.fmv\.hw\.x\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
index 4215eab11951..fb1ac2b735c3 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmemidx-zfa-medany.c
@@ -35,5 +35,6 @@  double foo (int i, int j)
   return z;
 }
 
-/* { dg-final { scan-assembler-times {\mth\.flrd\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mth\.flrd\M} } } */
+/* { dg-final { scan-assembler-times {\mlw\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mfmvp\.d\.x\M} 3 } } */