[committed] CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"

Message ID 20240904023038.A897020427@pchp3.se.axis.com
State Committed
Commit 62dd893ff8a12a1d28f595b4e5bc43cf9f7d1c07
Headers
Series [committed] CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus" |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Patch is already merged
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 warning Patch is already merged

Commit Message

Hans-Peter Nilsson Sept. 4, 2024, 2:30 a.m. UTC
  I thought I had already committed this, but it looks like it
was left dangling when the make_more_copies patch (now
committed) was in limbo and I disabled late-combine for
(coremark) performance reasons.  FWIW that's still a reason
at r15-3386-gaf1500dd8c00 (2.6% regression).

Tested cris-elf with/without -flate-combine-instructions.

-- >8 --
Exposed when running the test-suite with -flate-combine-instructions.

	* config/cris/cris.md (lra_szext_decomposed_indir_plus): New
	peephole2 pattern.
---
 gcc/config/cris/cris.md | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
  

Patch

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index c15395bd84c4..e066d5c920a9 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -3024,6 +3024,7 @@  (define_peephole2 ; lra_szext_decomposed
 ;; Re-compose a decomposed "indirect offset" address for a szext
 ;; operation.  The non-clobbering "addi" is generated by LRA.
 ;; This and lra_szext_decomposed is covered by cris/rld-legit1.c.
+;; (Unfortunately not true when enabling late-combine.)
 (define_peephole2 ; lra_szext_decomposed_indirect_with_offset
   [(parallel
     [(set (match_operand:SI 0 "register_operand")
@@ -3046,6 +3047,50 @@  (define_peephole2 ; lra_szext_decomposed_indirect_with_offset
        (mem:BW2 (plus:SI (szext:SI (mem:BW (match_dup 1))) (match_dup 2)))))
      (clobber (reg:CC CRIS_CC0_REGNUM))])])
 
+;; When enabling late-combine, we get a slightly changed register
+;; allocation.  The two allocations for the pseudo-registers involved
+;; in the matching pattern get "swapped" and the (plus ...) in the
+;; pattern above is now a load from a stack-slot.  If peephole2 is
+;; disabled, we see that the original sequence is actually improved;
+;; one less incoming instruction, a load.  We need to "undo" that
+;; improvement a bit and move that load "back" to before the sequence
+;; we combine in lra_szext_decomposed_indirect_with_offset.  But that
+;; changed again, so there's no define_peephole2 for that sequence
+;; here, because it'd be hard or impossible to write a matching
+;; test-case.  A few commits later, the incoming pattern sequence has
+;; changed again: back to the original but with the (plus...) part of
+;; the address inside the second memory reference.
+;; Coverage: cris/rld-legit1.c@r15-1880-gce34fcc572a0dc or
+;; r15-3386-gaf1500dd8c00 when adding -flate-combine-instructions.
+
+(define_peephole2 ; lra_szext_decomposed_indir_plus
+  [(parallel
+    [(set (match_operand:SI 0 "register_operand")
+	  (sign_extend:SI (mem:BW (match_operand:SI 1 "register_operand"))))
+     (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+    [(set (match_operand:SI 3 "register_operand")
+	  (szext:SI (mem:BW2 (plus:SI
+			      (match_operand:SI 4 "register_operand")
+			      (match_operand:SI 2 "register_operand")))))
+     (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "(REGNO (operands[0]) == REGNO (operands[3])
+    || peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[0]) == REGNO (operands[1])
+       || peep2_reg_dead_p (3, operands[0]))
+   && (rtx_equal_p (operands[2], operands[0])
+       || rtx_equal_p (operands[4], operands[0]))"
+  [(parallel
+    [(set
+      (match_dup 3)
+      (szext:SI
+       (mem:BW2 (plus:SI (szext:SI (mem:BW (match_dup 1))) (match_dup 2)))))
+     (clobber (reg:CC CRIS_CC0_REGNUM))])]
+{
+  if (! rtx_equal_p (operands[4], operands[0]))
+    operands[2] = operands[4];
+})
+
 ;; Add operations with similar or same decomposed addresses here, when
 ;; encountered - but only when covered by mentioned test-cases for at
 ;; least one of the cases generalized in the pattern.