From patchwork Tue Feb 20 11:30:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 86030 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3C30D3858D33 for ; Tue, 20 Feb 2024 11:31:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D3A173858D20 for ; Tue, 20 Feb 2024 11:30:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D3A173858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D3A173858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708428631; cv=none; b=rkQSu8c+qo9Afo4J3l6oTiDi6eRGufF5UlZ5ETSR+hc62n1KAZwa0P2uiGIj7/Z0oJGL1Kmtk+rYwPqB1NkKdo4Jkgw5X7tWl9+osHnhRrLrS1jgy/noXDUZ0IF2zv3AimCxchIg0REBJxu8a8avYmMrkGdKotnZSEX43fwZ9ig= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708428631; c=relaxed/simple; bh=/AaymZB8cP71hmxmcH8pFL4U29C9ivZGxWS9L6VNsFA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=V7w05oCUIL4HIBbf1H3GA0HS9D6gabXKHkEgW+Xtfiw07yhN4VplkoZATScznm+rFzUSoOvnmYVsflU3jO1IC9iD2Rb4P6WArjruZt67MM8E/Xwd3dr+ErfTNQwCBTmXZXlN5crsNeKmN+7PWC8+AzJFhwUtlD+IC2XG1IKjrmE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8A04B106F for ; Tue, 20 Feb 2024 03:31:05 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 06A713F762 for ; Tue, 20 Feb 2024 03:30:25 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Fix streaming-compatible code with -mtrack-speculation [PR113805] Date: Tue, 20 Feb 2024 11:30:24 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-20.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org This patch makes -mtrack-speculation work on streaming-compatible functions. There were two related issues. The first is that the streaming-compatible code was using TB(N)Z unconditionally, whereas those instructions are not allowed with speculation tracking. That part can be fixed in a similar way to the recent eh_return fix (PR112987). The second issue was that the speculation-tracking pass runs before some of the conditional branches are inserted. It isn't safe to insert the branches any earlier, so the patch instead adds a second speculation-tracking pass that runs afterwards. The new pass is only used for streaming-compatible functions. The testcase is adapted from call_sm_switch_1.c. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/113805 * config/aarch64/aarch64-passes.def (pass_late_track_speculation): New pass. * config/aarch64/aarch64-protos.h (make_pass_late_track_speculation): Declare. * config/aarch64/aarch64.md (is_call): New attribute. (*and3nr_compare0): Rename to... (@aarch64_and3nr_compare0): ...this. * config/aarch64/aarch64-sme.md (aarch64_get_sme_state) (aarch64_tpidr2_save, aarch64_tpidr2_restore): Add is_call attributes. * config/aarch64/aarch64-speculation.cc: Update file comment to describe the new late pass. (aarch64_do_track_speculation): Handle is_call insns like other calls. (pass_track_speculation): Add an is_late member variable. (pass_track_speculation::gate): Run the late pass for streaming- compatible functions and the early pass for other functions. (make_pass_track_speculation): Update accordingly. (make_pass_late_track_speculation): New function. * config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): New function. (aarch64_guard_switch_pstate_sm): Use it. gcc/testsuite/ PR target/113805 * gcc.target/aarch64/sme/call_sm_switch_11.c: New test. --- gcc/config/aarch64/aarch64-passes.def | 1 + gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-sme.md | 3 + gcc/config/aarch64/aarch64-speculation.cc | 64 ++++-- gcc/config/aarch64/aarch64.cc | 26 ++- gcc/config/aarch64/aarch64.md | 8 +- .../aarch64/sme/call_sm_switch_11.c | 209 ++++++++++++++++++ 7 files changed, 291 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def index 769d48f4faa..e20e6f4395b 100644 --- a/gcc/config/aarch64/aarch64-passes.def +++ b/gcc/config/aarch64/aarch64-passes.def @@ -22,6 +22,7 @@ INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra); INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering); INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation); INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstate_sm); +INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, pass_late_track_speculation); INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a0b142e0b94..bd719b992a5 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1075,6 +1075,7 @@ std::string aarch64_get_extension_string_for_isa_flags (aarch64_feature_flags, rtl_opt_pass *make_pass_aarch64_early_ra (gcc::context *); rtl_opt_pass *make_pass_fma_steering (gcc::context *); rtl_opt_pass *make_pass_track_speculation (gcc::context *); +rtl_opt_pass *make_pass_late_track_speculation (gcc::context *); rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *); rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt); rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt); diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md index 6edcdd96679..81d941871ac 100644 --- a/gcc/config/aarch64/aarch64-sme.md +++ b/gcc/config/aarch64/aarch64-sme.md @@ -105,6 +105,7 @@ (define_insn "aarch64_get_sme_state" (clobber (reg:CC CC_REGNUM))] "" "bl\t__arm_sme_state" + [(set_attr "is_call" "yes")] ) (define_insn "aarch64_read_svcr" @@ -242,6 +243,7 @@ (define_insn "aarch64_tpidr2_save" (clobber (reg:CC CC_REGNUM))] "" "bl\t__arm_tpidr2_save" + [(set_attr "is_call" "yes")] ) ;; Set PSTATE.ZA to 1. If ZA was previously dormant or active, @@ -358,6 +360,7 @@ (define_insn "aarch64_tpidr2_restore" (clobber (reg:CC CC_REGNUM))] "" "bl\t__arm_tpidr2_restore" + [(set_attr "is_call" "yes")] ) ;; Check whether a lazy save set up by aarch64_save_za was committed diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc index fdcc544b4d6..54f5ada7ed6 100644 --- a/gcc/config/aarch64/aarch64-speculation.cc +++ b/gcc/config/aarch64/aarch64-speculation.cc @@ -39,10 +39,10 @@ #include "insn-config.h" #include "recog.h" -/* This pass scans the RTL just before the final branch - re-organisation pass. The aim is to identify all places where - there is conditional control flow and to insert code that tracks - any speculative execution of a conditional branch. +/* This pass scans the RTL insns late in the RTL pipeline. The aim is + to identify all places where there is conditional control flow and + to insert code that tracks any speculative execution of a conditional + branch. To do this we reserve a call-clobbered register (so that it can be initialized very early in the function prologue) that can then be @@ -131,11 +131,22 @@ carry the tracking state in SP for this period of time unless the tracker value is needed at that point in time. - We run the pass just before the final branch reorganization pass so - that we can handle most of the conditional branch cases using the - standard edge insertion code. The reorg pass will hopefully clean - things up for afterwards so that the results aren't too - horrible. */ + We run the pass while the CFG is still present so that we can handle + most of the conditional branch cases using the standard edge insertion + code. Where possible, we prefer to run the pass just before the final + branch reorganization pass. That pass will then hopefully clean things + up afterwards so that the results aren't too horrible. + + However, we must run the pass after all conditional branches have + been inserted. switch_pstate_sm inserts conditional branches for + streaming-compatible code, and so for streaming-compatible functions, + this pass must run after that one. + + We handle this by having two copies of the pass: the normal one that + runs before branch reorganization, and a "late" one that runs just + before late_thread_prologue_and_epilogue. The two passes have + mutually exclusive gates, with the normal pass being chosen wherever + possible. */ /* Generate a code sequence to clobber SP if speculating incorreclty. */ static rtx_insn * @@ -315,11 +326,15 @@ aarch64_do_track_speculation () needs_tracking = true; } - if (CALL_P (insn)) + if (CALL_P (insn) + || (NONDEBUG_INSN_P (insn) + && recog_memoized (insn) >= 0 + && get_attr_is_call (insn) == IS_CALL_YES)) { bool tailcall - = (SIBLING_CALL_P (insn) - || find_reg_note (insn, REG_NORETURN, NULL_RTX)); + = (CALL_P (insn) + && (SIBLING_CALL_P (insn) + || find_reg_note (insn, REG_NORETURN, NULL_RTX))); /* Tailcalls are like returns, we can eliminate the transfer between the tracker register and SP if we @@ -461,21 +476,28 @@ const pass_data pass_data_aarch64_track_speculation = class pass_track_speculation : public rtl_opt_pass { - public: - pass_track_speculation(gcc::context *ctxt) - : rtl_opt_pass(pass_data_aarch64_track_speculation, ctxt) - {} +public: + pass_track_speculation(gcc::context *ctxt, bool is_late) + : rtl_opt_pass(pass_data_aarch64_track_speculation, ctxt), + is_late (is_late) + {} /* opt_pass methods: */ virtual bool gate (function *) { - return aarch64_track_speculation; + return (aarch64_track_speculation + && (is_late == bool (TARGET_STREAMING_COMPATIBLE))); } virtual unsigned int execute (function *) { return aarch64_do_track_speculation (); } + +private: + /* Whether this is the late pass that runs before late prologue/epilogue + insertion, or the normal pass that runs before branch reorganization. */ + bool is_late; }; // class pass_track_speculation. } // anon namespace. @@ -483,5 +505,11 @@ class pass_track_speculation : public rtl_opt_pass rtl_opt_pass * make_pass_track_speculation (gcc::context *ctxt) { - return new pass_track_speculation (ctxt); + return new pass_track_speculation (ctxt, /*is_late=*/false); +} + +rtl_opt_pass * +make_pass_late_track_speculation (gcc::context *ctxt) +{ + return new pass_track_speculation (ctxt, /*is_late=*/true); } diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 32eae49d4e9..76f1230d81a 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -2659,6 +2659,27 @@ aarch64_gen_compare_zero_and_branch (rtx_code code, rtx x, return gen_rtx_SET (pc_rtx, x); } +/* Return an rtx that branches to LABEL based on the value of bit BITNUM of X. + If CODE is NE, it branches to LABEL when the bit is set; if CODE is EQ, + it branches to LABEL when the bit is clear. */ + +static rtx +aarch64_gen_test_and_branch (rtx_code code, rtx x, int bitnum, + rtx_code_label *label) +{ + auto mode = GET_MODE (x); + if (aarch64_track_speculation) + { + auto mask = gen_int_mode (HOST_WIDE_INT_1U << bitnum, mode); + emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask)); + rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM); + rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx); + return gen_condjump (x, cc_reg, label); + } + return gen_aarch64_tb (code, mode, mode, + x, gen_int_mode (bitnum, mode), label); +} + /* Consider the operation: OPERANDS[0] = CODE (OPERANDS[1], OPERANDS[2]) + OPERANDS[3] @@ -4881,8 +4902,9 @@ aarch64_guard_switch_pstate_sm (rtx old_svcr, aarch64_feature_flags local_mode) gcc_assert (local_mode != 0); auto already_ok_cond = (local_mode & AARCH64_FL_SM_ON ? NE : EQ); auto *label = gen_label_rtx (); - auto *jump = emit_jump_insn (gen_aarch64_tb (already_ok_cond, DImode, DImode, - old_svcr, const0_rtx, label)); + auto branch = aarch64_gen_test_and_branch (already_ok_cond, old_svcr, 0, + label); + auto *jump = emit_jump_insn (branch); JUMP_LABEL (jump) = label; return label; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 6d39d2109e1..33fbe1b2e8d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -439,6 +439,12 @@ (define_enum "arches" [any rcpc8_4 fp fp_q base_simd nobase_simd (define_enum_attr "arch" "arches" (const_string "any")) +;; Whether a normal INSN in fact contains a call. Sometimes we represent +;; calls to functions that use an ad-hoc ABI as normal insns, both for +;; optimization reasons and to avoid the need to describe the ABI to +;; target-independent code. +(define_attr "is_call" "no,yes" (const_string "no")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -5395,7 +5401,7 @@ (define_insn "*ands_compare0" [(set_attr "type" "alus_imm")] ) -(define_insn "*and3nr_compare0" +(define_insn "@aarch64_and3nr_compare0" [(set (reg:CC_NZV CC_REGNUM) (compare:CC_NZV (and:GPI (match_operand:GPI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c new file mode 100644 index 00000000000..ee6f98737d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c @@ -0,0 +1,209 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -funwind-tables -mtrack-speculation" } +// { dg-final { check-function-bodies "**" "" } } + +void ns_callee (); + void s_callee () [[arm::streaming]]; + void sc_callee () [[arm::streaming_compatible]]; + +void ns_callee_stack (int, int, int, int, int, int, int, int, int); + +struct callbacks { + void (*ns_ptr) (); + void (*s_ptr) () [[arm::streaming]]; + void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +/* +** sc_caller_sme: +** cmp sp, #?0 +** csetm x15, ne +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** ldr x16, \[x29, #?16\] +** tst x16, #?1 +** beq [^\n]* +** csel x15, x15, xzr, ne +** smstop sm +** b [^\n]* +** csel x15, x15, xzr, eq +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl ns_callee +** cmp sp, #?0 +** csetm x15, ne +** ldr x16, \[x29, #?16\] +** tst x16, #?1 +** beq [^\n]* +** csel x15, x15, xzr, ne +** smstart sm +** b [^\n]* +** csel x15, x15, xzr, eq +** ldr x16, \[x29, #?16\] +** tst x16, #?1 +** bne [^\n]* +** csel x15, x15, xzr, eq +** smstart sm +** b [^\n]* +** csel x15, x15, xzr, ne +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl s_callee +** cmp sp, #?0 +** csetm x15, ne +** ldr x16, \[x29, #?16\] +** tst x16, #?1 +** bne [^\n]* +** csel x15, x15, xzr, eq +** smstop sm +** b [^\n]* +** csel x15, x15, xzr, ne +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl sc_callee +** cmp sp, #?0 +** csetm x15, ne +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** ret +*/ +void +sc_caller_sme () [[arm::streaming_compatible]] +{ + ns_callee (); + s_callee (); + sc_callee (); +} + +#pragma GCC target "+nosme" + +/* +** sc_caller: +** cmp sp, #?0 +** csetm x15, ne +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl __arm_sme_state +** cmp sp, #?0 +** csetm x15, ne +** str x0, \[x29, #?16\] +** ... +** bl sc_callee +** cmp sp, #?0 +** csetm x15, ne +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** ret +*/ +void +sc_caller () [[arm::streaming_compatible]] +{ + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_x0: +** ... +** mov x10, x0 +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl __arm_sme_state +** ... +** str wzr, \[x10\] +** ... +*/ +void +sc_caller_x0 (int *ptr) [[arm::streaming_compatible]] +{ + *ptr = 0; + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_x1: +** ... +** mov x10, x0 +** mov x11, x1 +** mov x14, sp +** and x14, x14, x15 +** mov sp, x14 +** bl __arm_sme_state +** ... +** str w11, \[x10\] +** ... +*/ +void +sc_caller_x1 (int *ptr, int a) [[arm::streaming_compatible]] +{ + *ptr = a; + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_stack: +** cmp sp, #?0 +** csetm x15, ne +** sub sp, sp, #112 +** stp x29, x30, \[sp, #?16\] +** add x29, sp, #?16 +** ... +** stp d8, d9, \[sp, #?48\] +** ... +** bl __arm_sme_state +** cmp sp, #?0 +** csetm x15, ne +** str x0, \[x29, #?16\] +** ... +** bl ns_callee_stack +** cmp sp, #?0 +** csetm x15, ne +** ldr x16, \[x29, #?16\] +** tst x16, #?1 +** beq [^\n]* +** csel x15, x15, xzr, ne +** smstart sm +** ... +*/ +void +sc_caller_stack () [[arm::streaming_compatible]] +{ + ns_callee_stack (0, 0, 0, 0, 0, 0, 0, 0, 0); +} + +/* { dg-final { scan-assembler {sc_caller_sme:(?:(?!ret).)*\.cfi_offset 46, -72\n} } } */ +/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -72\n} } } */