[v3] xtensa: Eliminate the use of callee-saved register that saves and restores only once
Commit Message
On 2023/01/17 20:23, Max Filippov wrote:
> Hi Suwa-san,
Hi!
> There's still a few regressions in tests with -fcompare-debug because
> code generated with -g and without it is different:
> E.g. check the following test with -g0 and -g:
Again debug_insn is the problem...
=====
In the case of the CALL0 ABI, values that must be retained before and
after function calls are placed in the callee-saved registers (A12
through A15) and referenced later. However, it is often the case that
the save and the reference are each only once and a simple register-
register move (the frame pointer is needed to recover the stack pointer
and must be excluded).
e.g. in the following example, if there are no other occurrences of
register A14:
;; before
; prologue {
...
s32i.n a14, sp, 16
...
; } prologue
...
mov.n a14, a6
...
call0 foo
...
mov.n a8, a14
...
; epilogue {
...
l32i.n a14, sp, 16
...
; } epilogue
It can be possible like this:
;; after
; prologue {
...
(deleted)
...
; } prologue
...
s32i.n a6, sp, 16
...
call0 foo
...
l32i.n a8, sp, 16
...
; epilogue {
...
(deleted)
...
; } epilogue
This patch introduces a new peephole2 pattern that implements the above.
gcc/ChangeLog:
* config/xtensa/xtensa.md: New peephole2 pattern that eliminates
the use of callee-saved register that saves and restores only once
for other register, by using its stack slot directly.
---
gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
Comments
Hi Suwa-san,
On Tue, Jan 17, 2023 at 8:23 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> In the case of the CALL0 ABI, values that must be retained before and
> after function calls are placed in the callee-saved registers (A12
> through A15) and referenced later. However, it is often the case that
> the save and the reference are each only once and a simple register-
> register move (the frame pointer is needed to recover the stack pointer
> and must be excluded).
>
> e.g. in the following example, if there are no other occurrences of
> register A14:
>
> ;; before
> ; prologue {
> ...
> s32i.n a14, sp, 16
> ...
> ; } prologue
> ...
> mov.n a14, a6
> ...
> call0 foo
> ...
> mov.n a8, a14
> ...
> ; epilogue {
> ...
> l32i.n a14, sp, 16
> ...
> ; } epilogue
>
> It can be possible like this:
>
> ;; after
> ; prologue {
> ...
> (deleted)
> ...
> ; } prologue
> ...
> s32i.n a6, sp, 16
> ...
> call0 foo
> ...
> l32i.n a8, sp, 16
> ...
> ; epilogue {
> ...
> (deleted)
> ...
> ; } epilogue
>
> This patch introduces a new peephole2 pattern that implements the above.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md: New peephole2 pattern that eliminates
> the use of callee-saved register that saves and restores only once
> for other register, by using its stack slot directly.
> ---
> gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
This change introduces a bunch of different test failures:
FAIL: gcc.c-torture/execute/builtins/strpbrk.c execution, -O2
FAIL: gcc.c-torture/execute/builtins/strpbrk.c execution, -O3 -g
FAIL: gcc.c-torture/execute/builtins/strpbrk.c execution, -Os
FAIL: gcc.c-torture/execute/builtins/strpbrk.c execution, -O2 -flto
-fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/strstr-asm.c execution, -Os
FAIL: gcc.c-torture/execute/20001130-1.c -Os execution test
FAIL: gcc.c-torture/execute/20040311-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20040311-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20040311-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20121108-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20121108-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution
test
FAIL: gcc.c-torture/execute/20121108-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20121108-1.c -Os execution test
FAIL: gcc.c-torture/execute/20121108-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20121108-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/20140622-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20140622-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20140622-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20141022-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20141022-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution
test
FAIL: gcc.c-torture/execute/20141022-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20141022-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20141022-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/20141107-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/961213-1.c -Os execution test
FAIL: gcc.c-torture/execute/builtin-bitops-1.c -Os execution test
FAIL: gcc.c-torture/execute/cvt-1.c -O2 execution test
FAIL: gcc.c-torture/execute/cvt-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/cvt-1.c -Os execution test
FAIL: gcc.c-torture/execute/cvt-1.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr40747.c -O2 execution test
FAIL: gcc.c-torture/execute/pr40747.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr40747.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr60960.c -O2 execution test
FAIL: gcc.c-torture/execute/pr60960.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr60960.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr60960.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/ieee/fp-cmp-5.c execution, -O2
FAIL: gcc.c-torture/execute/ieee/fp-cmp-5.c execution, -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
FAIL: gcc.c-torture/execute/ieee/fp-cmp-5.c execution, -O3 -g
FAIL: gcc.c-torture/execute/ieee/fp-cmp-5.c execution, -O2 -flto
-fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/ieee/fp-cmp-5.c execution, -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.dg/pr103721.c execution test
FAIL: gcc.dg/pr38364.c execution test
FAIL: gcc.dg/pr92301.c execution test
FAIL: gcc.dg/strlenopt-92.c execution test
FAIL: gcc.dg/torture/pr90553.c -O2 execution test
FAIL: gcc.dg/torture/pr90553.c -O3 -g execution test
FAIL: gcc.dg/torture/pr90553.c -Os execution test
FAIL: gcc.dg/torture/pr90553.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.dg/torture/pr90553.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O2 execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O2 -fpic execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O3 -g execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O3 -g -fpic execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.dg/torture/stackalign/vararg-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none -fpic execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O2 execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O3 -g execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -Os execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O2 -fpic execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O3 -g -fpic execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -Os -fpic execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O2 -fPIC execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -O3 -g -fPIC execution test
FAIL: gcc.dg/torture/tls/thr-init-2.c -Os -fPIC execution test
--
Thanks.
-- Max
@@ -3024,3 +3024,65 @@ FALLTHRU:;
operands[1] = GEN_INT (imm0);
operands[2] = GEN_INT (imm1);
})
+
+(define_peephole2
+ [(set (match_operand:SI 0 "register_operand")
+ (match_operand:SI 1 "reload_operand"))]
+ "!TARGET_WINDOWED_ABI && df
+ && epilogue_contains (insn)
+ && ! call_used_or_fixed_reg_p (REGNO (operands[0]))
+ && (!frame_pointer_needed
+ || REGNO (operands[0]) != HARD_FRAME_POINTER_REGNUM)"
+ [(const_int 0)]
+{
+ rtx reg = operands[0], pattern;
+ rtx_insn *insnP = NULL, *insnS = NULL, *insnR = NULL;
+ df_ref ref;
+ rtx_insn *insn;
+ for (ref = DF_REG_DEF_CHAIN (REGNO (reg));
+ ref; ref = DF_REF_NEXT_REG (ref))
+ if (DF_REF_CLASS (ref) != DF_REF_REGULAR
+ || ! NONJUMP_INSN_P (insn = DF_REF_INSN (ref)))
+ continue;
+ else if (insn == curr_insn)
+ continue;
+ else if (GET_CODE (pattern = PATTERN (insn)) == SET
+ && rtx_equal_p (SET_DEST (pattern), reg)
+ && REG_P (SET_SRC (pattern)))
+ {
+ if (insnS)
+ FAIL;
+ insnS = insn;
+ continue;
+ }
+ else
+ FAIL;
+ for (ref = DF_REG_USE_CHAIN (REGNO (reg));
+ ref; ref = DF_REF_NEXT_REG (ref))
+ if (DF_REF_CLASS (ref) != DF_REF_REGULAR
+ || ! NONJUMP_INSN_P (insn = DF_REF_INSN (ref)))
+ continue;
+ else if (prologue_contains (insn))
+ {
+ insnP = insn;
+ continue;
+ }
+ else if (GET_CODE (pattern = PATTERN (insn)) == SET
+ && rtx_equal_p (SET_SRC (pattern), reg)
+ && REG_P (SET_DEST (pattern)))
+ {
+ if (insnR)
+ FAIL;
+ insnR = insn;
+ continue;
+ }
+ else
+ FAIL;
+ if (!insnP || !insnS || !insnR)
+ FAIL;
+ SET_DEST (PATTERN (insnS)) = copy_rtx (operands[1]);
+ df_insn_rescan (insnS);
+ SET_SRC (PATTERN (insnR)) = copy_rtx (operands[1]);
+ df_insn_rescan (insnR);
+ set_insn_deleted (insnP);
+})