[3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
Commit Message
Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that. Using
reg_overlap_mentioned_p is much more robust.
gcc/ChangeLog:
* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.
---
gcc/config/arm/arm.cc | 3 ++-
.../gcc.target/arm/mve/dlstp-invalid-asm.c | 20 ++++++++++++++++++-
2 files changed, 21 insertions(+), 2 deletions(-)
Comments
On 11/29/24 11:30, Andre Vieira wrote:
>
> Address a problem we were having where we were missing on detecting uses of
> vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
> the use was inside a SUBREG and rtx_equal_p does not catch that. Using
> reg_overlap_mentioned_p is much more robust.
>
> gcc/ChangeLog:
>
> * gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
> reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
> vctp_vpr_generated inside subregs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
> (test10a): ... this.
> (test10b): Variation of test10a with a small change to trigger wrong
> codegen.
Thanks, this patch is OK too, provided the update in the comment before
test10b as I requested in your previous version.
> ---
> gcc/config/arm/arm.cc | 3 ++-
> .../gcc.target/arm/mve/dlstp-invalid-asm.c | 20 ++++++++++++++++++-
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
@@ -35847,7 +35847,8 @@ arm_attempt_dlstp_transform (rtx label)
df_ref insn_uses = NULL;
FOR_EACH_INSN_USE (insn_uses, insn)
{
- if (rtx_equal_p (vctp_vpr_generated, DF_REF_REG (insn_uses)))
+ if (reg_overlap_mentioned_p (vctp_vpr_generated,
+ DF_REF_REG (insn_uses)))
{
end_sequence ();
return 1;
@@ -128,7 +128,7 @@ void test9 (int32_t *a, int32_t *b, int32_t *c, int n)
}
/* Using a VPR that gets re-generated within the loop. */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
{
mve_pred16_t p = vctp32q (n);
while (n > 0)
@@ -145,6 +145,24 @@ void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
}
}
+/* Using a VPR that gets re-generated within the loop. */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)
+{
+ mve_pred16_t p = vctp32q (n-4);
+ while (n > 0)
+ {
+ int32x4_t va = vldrwq_z_s32 (a, p);
+ p = vctp32q (n);
+ int32x4_t vb = vldrwq_z_s32 (b, p);
+ int32x4_t vc = vaddq_x_s32 (va, vb, p);
+ vstrwq_p_s32 (c, vc, p);
+ c += 4;
+ a += 4;
+ b += 4;
+ n -= 4;
+ }
+}
+
/* Using vctp32q_m instead of vctp32q. */
void test11 (int32_t *a, int32_t *b, int32_t *c, int n, mve_pred16_t p0)
{