[3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs

Message ID 20241129103012.3477414-4-andre.simoesdiasvieira@arm.com
State Changes Requested
Headers
Series arm, mve: Fix DLSTP testism and issue after changes in codegen |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Andre Vieira (lists) Nov. 29, 2024, 10:30 a.m. UTC
  Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

	* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
	reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
	vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
	(test10a): ... this.
	(test10b): Variation of test10a with a small change to trigger wrong
	codegen.
---
 gcc/config/arm/arm.cc                         |  3 ++-
 .../gcc.target/arm/mve/dlstp-invalid-asm.c    | 20 ++++++++++++++++++-
 2 files changed, 21 insertions(+), 2 deletions(-)
  

Comments

Christophe Lyon Nov. 29, 2024, 11 a.m. UTC | #1
On 11/29/24 11:30, Andre Vieira wrote:
> 
> Address a problem we were having where we were missing on detecting uses of
> vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
> the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
> reg_overlap_mentioned_p is much more robust.
> 
> gcc/ChangeLog:
> 
> 	* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
> 	reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
> 	vctp_vpr_generated inside subregs.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
> 	(test10a): ... this.
> 	(test10b): Variation of test10a with a small change to trigger wrong
> 	codegen.

Thanks, this patch is OK too, provided the update in the comment before 
test10b as I requested in your previous version.

> ---
>   gcc/config/arm/arm.cc                         |  3 ++-
>   .../gcc.target/arm/mve/dlstp-invalid-asm.c    | 20 ++++++++++++++++++-
>   2 files changed, 21 insertions(+), 2 deletions(-)
>
  

Patch

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7292fddef80..7f82fb94a56 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35847,7 +35847,8 @@  arm_attempt_dlstp_transform (rtx label)
 	  df_ref insn_uses = NULL;
 	  FOR_EACH_INSN_USE (insn_uses, insn)
 	  {
-	    if (rtx_equal_p (vctp_vpr_generated, DF_REF_REG (insn_uses)))
+	    if (reg_overlap_mentioned_p (vctp_vpr_generated,
+					 DF_REF_REG (insn_uses)))
 	      {
 		end_sequence ();
 		return 1;
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
index 26df2d30523..f26754cc482 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
@@ -128,7 +128,7 @@  void test9 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 
 /* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
 {
   mve_pred16_t p = vctp32q (n);
   while (n > 0)
@@ -145,6 +145,24 @@  void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
     }
 }
 
+/* Using a VPR that gets re-generated within the loop.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)
+{
+  mve_pred16_t p = vctp32q (n-4);
+  while (n > 0)
+    {
+      int32x4_t va = vldrwq_z_s32 (a, p);
+      p = vctp32q (n);
+      int32x4_t vb = vldrwq_z_s32 (b, p);
+      int32x4_t vc = vaddq_x_s32 (va, vb, p);
+      vstrwq_p_s32 (c, vc, p);
+      c += 4;
+      a += 4;
+      b += 4;
+      n -= 4;
+    }
+}
+
 /* Using vctp32q_m instead of vctp32q.  */
 void test11 (int32_t *a, int32_t *b, int32_t *c, int n, mve_pred16_t p0)
 {