[RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
fail
|
Test failed
|
Commit Message
The prefetch instruction that is emitted by __builtin_prefetch is re-ordered on GCC, but not on clang[0]. GCC's behavior is surprising because when using the builtin you want the instruction to be placed at the exact point where you put it. Moving it around, specially across load/stores, may end up being a pessimization. Adding a blockage instruction before the prefetch prevents the scheduler from moving it.
[0] https://godbolt.org/z/Ycjr7Tq8b
-- 8< --
Comments
On Fri, 2024-09-27 at 00:26 -0400, Pietro Monteiro wrote:
> The prefetch instruction that is emitted by __builtin_prefetch is re-ordered on GCC, but not on clang[0]. GCC's behavior is surprising because when using the builtin you want the instruction to be placed at the exact point where you put it. Moving it around, specially across load/stores, may end up being a pessimization. Adding a blockage instruction before the prefetch prevents the scheduler from moving it.
>
> [0] https://godbolt.org/z/Ycjr7Tq8b
>
>
As pointed out by Andrew in the PR, this issue has been around for a while:
see
https://gcc.gnu.org/legacy-ml/gcc-patches/2009-05/msg01233.html
and
https://gcc.gnu.org/legacy-ml/gcc-patches/2009-09/msg00130.html
However, those fix the reordering only for sched2. Hypothetically the
reordering could happen also before sched2 by some other RTL pass. So
perhaps it's better to insert the barrier right when the builtin is emitted.
Best regards,
Oleg Endo
> -- 8< --
>
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 37c7c98e5c..fec751e0d6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
> create_integer_operand (&ops[1], INTVAL (op1));
> create_integer_operand (&ops[2], INTVAL (op2));
> if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> - return;
> + {
> + /* Prevent the prefetch from being moved. */
> + rtx_insn *last = get_last_insn ();
> + emit_insn_before (gen_blockage (), last);
> + return;
> + }
> }
>
> /* Don't do anything with direct references to volatile memory, but
On Fri, Sep 27, 2024 at 6:27 AM Pietro Monteiro
<pietro@sociotechnical.xyz> wrote:
>
> The prefetch instruction that is emitted by __builtin_prefetch is re-ordered on GCC, but not on clang[0]. GCC's behavior is surprising because when using the builtin you want the instruction to be placed at the exact point where you put it. Moving it around, specially across load/stores, may end up being a pessimization. Adding a blockage instruction before the prefetch prevents the scheduler from moving it.
>
> [0] https://godbolt.org/z/Ycjr7Tq8b
I think the testcase is quite broken (aka not real-world). I would also suggest
that a hard scheduling barrier isn't the correct tool (see Olegs response), but
instead prefetch should properly model a data dependence so it only blocks
code motion for dependent accesses. On the GIMPLE side disambiguation
happens solely based on pointer analysis, on RTL where prefetch is likely
an UNSPEC I would suggest to model the dependence as a USE/CLOBBER
pair of a MEM.
Richard.
> -- 8< --
>
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 37c7c98e5c..fec751e0d6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
> create_integer_operand (&ops[1], INTVAL (op1));
> create_integer_operand (&ops[2], INTVAL (op2));
> if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> - return;
> + {
> + /* Prevent the prefetch from being moved. */
> + rtx_insn *last = get_last_insn ();
> + emit_insn_before (gen_blockage (), last);
> + return;
> + }
> }
>
> /* Don't do anything with direct references to volatile memory, but
@@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
create_integer_operand (&ops[1], INTVAL (op1));
create_integer_operand (&ops[2], INTVAL (op2));
if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
- return;
+ {
+ /* Prevent the prefetch from being moved. */
+ rtx_insn *last = get_last_insn ();
+ emit_insn_before (gen_blockage (), last);
+ return;
+ }
}
/* Don't do anything with direct references to volatile memory, but