AArch64: Block combine_and_move from creating FP literal loads
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
Commit Message
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
combines moves. The movsf/df patterns allow all FP immediates since they rely
on a split pattern. However splits do not happen during IRA, so the result is
extra literal loads. To avoid this, use a more accurate check that blocks
creating FP immediates that need a split during combine_and_move.
double f(void) { return 128.0; }
-O2 -fno-schedule-insns gives:
adrp x0, .LC0
ldr d0, [x0, #:lo12:.LC0]
ret
After patch:
mov x0, 4638707616191610880
fmov d0, x0
ret
Passes bootstrap & regress, OK for commit?
gcc/ChangeLog:
* config/aarch64/aarch64.md (movhf_aarch64): Use aarch64_valid_fp_move.
(movsf_aarch64): Likewise.
(movdf_aarch64): Likewise.
* config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
* config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
---
Comments
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
> combines moves. The movsf/df patterns allow all FP immediates since they rely
> on a split pattern. However splits do not happen during IRA, so the result is
> extra literal loads. To avoid this, use a more accurate check that blocks
> creating FP immediates that need a split during combine_and_move.
>
> double f(void) { return 128.0; }
>
> -O2 -fno-schedule-insns gives:
>
> adrp x0, .LC0
> ldr d0, [x0, #:lo12:.LC0]
> ret
>
> After patch:
>
> mov x0, 4638707616191610880
> fmov d0, x0
> ret
>
> Passes bootstrap & regress, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64.md (movhf_aarch64): Use aarch64_valid_fp_move.
> (movsf_aarch64): Likewise.
> (movdf_aarch64): Likewise.
> * config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> * config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 9be64913091443a62dc6d1a80c295dc52aaeb950..f4839413cf3e995871b728e2a36e332b89cd6abf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -758,6 +758,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
> opt_machine_mode aarch64_vq_mode (scalar_mode);
> opt_machine_mode aarch64_full_sve_mode (scalar_mode);
> bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
> +bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
> bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index a6cc00e74abd4d96fa47f5612f271eb4fc95e7a1..130c1ff1e363db253b008e71c7e8e5deec8c46c8 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11144,6 +11144,37 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode)
> return aarch64_simd_valid_mov_imm (v_op);
> }
>
> +/* Return TRUE if DST and SRC with mode MODE is a valid fp move. */
> +bool
> +aarch64_valid_fp_move (rtx dst, rtx src, machine_mode mode)
> +{
> + if (!TARGET_FLOAT)
> + return false;
> +
> + if (aarch64_reg_or_fp_zero (src, mode))
> + return true;
> +
> + if (!register_operand (dst, mode))
> + return false;
> +
> + if (MEM_P (src))
> + return true;
> +
> + if (!DECIMAL_FLOAT_MODE_P (mode))
> + {
> + if (aarch64_can_const_movi_rtx_p (src, mode)
> + || aarch64_float_const_representable_p (src)
> + || aarch64_float_const_zero_rtx_p (src))
> + return true;
> +
> + /* Block combine_and_move pass from creating FP immediates which
> + require a split during IRA - only allow this before regalloc. */
> + if (aarch64_float_const_rtx_p (src))
> + return can_create_pseudo_p () && !ira_in_progress;
> + }
> +
> + return can_create_pseudo_p ();
It's ok for instructions to require properties that are false during
early RTL passes and then transition to true. But they can't require
properties that go from true to false, since that would mean that
existing instructions become unrecognisable at certain points during
the compilation process.
Also, why are the conditions tighter for aarch64_float_const_rtx_p
(which we can split) but not for the general case (which we can't,
and presumably need to force to memory)? I.e. for what cases do we want
the final return to be (sometimes) true? If it's going to be forced
into memory anyway then wouldn't we get better optimisation by exposing
that early?
Would it be possible to handle the split during expand instead?
Or do we expect to discover new FP constants during RTL optimisation?
If so, where do they come from?
Sorry for all the questions :)
Richard
> +}
>
> /* Return the fixed registers used for condition codes. */
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 20956fc49d8232763b127629ded17037ad7d7960..5d3fa9628952031f52474291e160b957d774b011 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1644,8 +1644,7 @@ (define_expand "mov<mode>"
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:HFBF 0 "nonimmediate_operand")
> (match_operand:HFBF 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%0.4h, #0
> [ w , ?rY ; f_mcr , fp16 ] fmov\t%h0, %w1
> @@ -1668,8 +1667,7 @@ (define_insn "*mov<mode>_aarch64"
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:SFD 0 "nonimmediate_operand")
> (match_operand:SFD 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%0.2s, #0
> [ w , ?rY ; f_mcr , * ] fmov\t%s0, %w1
> @@ -1689,8 +1687,7 @@ (define_insn "*mov<mode>_aarch64"
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:DFD 0 "nonimmediate_operand")
> (match_operand:DFD 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%d0, #0
> [ w , ?rY ; f_mcr , * ] fmov\t%d0, %x1
Hi Richard,
> It's ok for instructions to require properties that are false during
> early RTL passes and then transition to true. But they can't require
> properties that go from true to false, since that would mean that
> existing instructions become unrecognisable at certain points during
> the compilation process.
Only invalid cases are rejected - can_create_pseudo_p is used to reject
instructions that cannot be split after regalloc. This is basically a small
extension to that: we always split the aarch64_float_const_rtx_p case before
regalloc, and thus no such instructions should exist/created during IRA.
So this simply blocks any code that tries to undo the split.
> Also, why are the conditions tighter for aarch64_float_const_rtx_p
> (which we can split) but not for the general case (which we can't,
> and presumably need to force to memory)? I.e. for what cases do we want
> the final return to be (sometimes) true? If it's going to be forced
> into memory anyway then wouldn't we get better optimisation by exposing
> that early?
This is the only case that is always split before regalloc. The forced into memory
case works exactly like it does now for all other FP immediates.
> Would it be possible to handle the split during expand instead?
> Or do we expect to discover new FP constants during RTL optimisation?
> If so, where do they come from?
The split is done during the split passes. The issue is that combine_and_move
undoes this split during IRA and creates new FP constants that then require to be
split, but they aren't because we don't run split passes during IRA.
Cheers,
Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> It's ok for instructions to require properties that are false during
>> early RTL passes and then transition to true. But they can't require
>> properties that go from true to false, since that would mean that
>> existing instructions become unrecognisable at certain points during
>> the compilation process.
>
> Only invalid cases are rejected - can_create_pseudo_p is used to reject
> instructions that cannot be split after regalloc. This is basically a small
> extension to that: we always split the aarch64_float_const_rtx_p case before
> regalloc, and thus no such instructions should exist/created during IRA.
>
> So this simply blocks any code that tries to undo the split.
But I think you missed my point. Suppose we have:
(define_insn
[(...)]
"FOO"
"..."
)
In general, as the RTL pipeline progresses, FOO is only allowed to go
from 0 to 1. It shouldn't go from 1 back to 0.
That's because, once an instruction matches, the instruction should
continue to match. It should always be possible to set the INSN_CODE of
an existing instruction to -1, rerun recog, and get the same instruction
code back.
Because of that, insn conditions shouldn't depend on can_create_pseudo_p.
>> Also, why are the conditions tighter for aarch64_float_const_rtx_p
>> (which we can split) but not for the general case (which we can't,
>> and presumably need to force to memory)? I.e. for what cases do we want
>> the final return to be (sometimes) true? If it's going to be forced
>> into memory anyway then wouldn't we get better optimisation by exposing
>> that early?
>
> This is the only case that is always split before regalloc. The forced into memory
> case works exactly like it does now for all other FP immediates.
>
>> Would it be possible to handle the split during expand instead?
>> Or do we expect to discover new FP constants during RTL optimisation?
>> If so, where do they come from?
>
> The split is done during the split passes. The issue is that combine_and_move
> undoes this split during IRA and creates new FP constants that then require to be
> split, but they aren't because we don't run split passes during IRA.
Yeah, I realise it's done by the split pass at the moment. My question was:
why do we need to wait till then? Why can't we do it in expand instead?
Are there cases where we expect to discover new FP constants during RTL
optimisation that weren't present in gimple? And if so, which cases are
they? Where do the constants come from?
Thanks,
Richard
Hi Richard,
> That's because, once an instruction matches, the instruction should
> continue to match. It should always be possible to set the INSN_CODE of
> an existing instruction to -1, rerun recog, and get the same instruction
> code back.
>
> Because of that, insn conditions shouldn't depend on can_create_pseudo_p.
We should never get into that state since it would be incorrect. If say we
created a movdf after regalloc that needs a split or a literal load, it cannot
match any alternative. So recog would fail.
> Yeah, I realise it's done by the split pass at the moment. My question was:
> why do we need to wait till then? Why can't we do it in expand instead?
We could split at a different time. But why would that make a difference?
As long as we allow all FP immediates at all times in movsf/df, we end up
with the same issue.
> Are there cases where we expect to discover new FP constants during RTL
> optimisation that weren't present in gimple? And if so, which cases are
> they? Where do the constants come from?
These constants are created by undoing the previous split (using REG_EQUIV
to just create a new movsf/movdf instruction). When the split happened is not
relevant. Even using UNSPEC would not work as long as there is a REG_EQUIV
somewhere.
Try my trivial example with -fno-schedule-insns and look at what happens
before/after IRA.
Cheers,
Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> That's because, once an instruction matches, the instruction should
>> continue to match. It should always be possible to set the INSN_CODE of
>> an existing instruction to -1, rerun recog, and get the same instruction
>> code back.
>>
>> Because of that, insn conditions shouldn't depend on can_create_pseudo_p.
>
> We should never get into that state since it would be incorrect. If say we
> created a movdf after regalloc that needs a split or a literal load, it cannot
> match any alternative. So recog would fail.
But, as this series shows, it's possible to match new instructions after
split1 and before can_create_pseudo_p returns false. And in general, we
shouldn't rely on split1 for correctness.
>> Yeah, I realise it's done by the split pass at the moment. My question was:
>> why do we need to wait till then? Why can't we do it in expand instead?
>
> We could split at a different time. But why would that make a difference?
> As long as we allow all FP immediates at all times in movsf/df, we end up
> with the same issue.
The idea was that, if we did the split during expand, the movsf/df
define_insns would then only accept the immediates that their
constraints can handle.
>> Are there cases where we expect to discover new FP constants during RTL
>> optimisation that weren't present in gimple? And if so, which cases are
>> they? Where do the constants come from?
>
> These constants are created by undoing the previous split (using REG_EQUIV
> to just create a new movsf/movdf instruction). When the split happened is not
> relevant. Even using UNSPEC would not work as long as there is a REG_EQUIV
> somewhere.
Yeah, I realise that, which is why I think the expand approach would
be more robust. It would in some ways be similar to what we do for
symbolic constants.
An alternative would be to add a constraint for the kind of constants that
the split handles, and then use it in a new alternative of the move patterns.
The split could then happen at any time, before or after reload.
Thanks,
Richard
Hi Richard,
> The idea was that, if we did the split during expand, the movsf/df
> define_insns would then only accept the immediates that their
> constraints can handle.
Right, always disallowing these immediates works fine too (it seems
reload doesn't require all immediates to be valid), and then the split is
redundant. I've updated the patch:
v2: split during expand, remove movsf/df splitter
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
combines moves. The movsf/df patterns allow all FP immediates since they rely
on a split pattern. However splits do not happen during IRA, so the result is
extra literal loads. To avoid this, split early during expand and block
creation of FP immediates that need this split.
double f(void) { return 128.0; }
-O2 -fno-schedule-insns gives:
adrp x0, .LC0
ldr d0, [x0, #:lo12:.LC0]
ret
After patch:
mov x0, 4638707616191610880
fmov d0, x0
ret
Passes bootstrap & regress, OK for commit?
gcc/ChangeLog:
* config/aarch64/aarch64.md (movhf_aarch64): Use aarch64_valid_fp_move.
(movsf_aarch64): Likewise.
(movdf_aarch64): Likewise.
* config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
* config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
---
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 05d3258abf7b43342d9058dd1b365a1a0870cdc2..6da81556110c978a9de6f6fad5775c9d77771b10 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -843,6 +843,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
opt_machine_mode aarch64_vq_mode (scalar_mode);
opt_machine_mode aarch64_full_sve_mode (scalar_mode);
bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
+bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 00bcf18ae97cea94227c00798b7951daa255d360..ec2d391d1e3eb9bd28f66fb6ee85311b4ced4c94 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11175,6 +11175,36 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode)
return aarch64_simd_valid_mov_imm (v_op);
}
+/* Return TRUE if DST and SRC with mode MODE is a valid fp move. */
+bool
+aarch64_valid_fp_move (rtx dst, rtx src, machine_mode mode)
+{
+ if (!TARGET_FLOAT)
+ return false;
+
+ if (aarch64_reg_or_fp_zero (src, mode))
+ return true;
+
+ if (!register_operand (dst, mode))
+ return false;
+
+ if (MEM_P (src))
+ return true;
+
+ if (!DECIMAL_FLOAT_MODE_P (mode))
+ {
+ if (aarch64_can_const_movi_rtx_p (src, mode)
+ || aarch64_float_const_representable_p (src)
+ || aarch64_float_const_zero_rtx_p (src))
+ return true;
+
+ /* Block FP immediates which are split during expand. */
+ if (aarch64_float_const_rtx_p (src))
+ return false;
+ }
+
+ return can_create_pseudo_p ();
+}
/* Return the fixed registers used for condition codes. */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d10197c9e8dd66b7a30a1034b629297b9992661..b865ae2ff5e23edc4d8990e1efd4a51dd195f41f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1636,14 +1636,33 @@ (define_expand "mov<mode>"
&& ! (GET_CODE (operands[1]) == CONST_DOUBLE
&& aarch64_float_const_zero_rtx_p (operands[1])))
operands[1] = force_reg (<MODE>mode, operands[1]);
+
+ if (!DECIMAL_FLOAT_MODE_P (<MODE>mode)
+ && GET_CODE (operands[1]) == CONST_DOUBLE
+ && can_create_pseudo_p ()
+ && !aarch64_can_const_movi_rtx_p (operands[1], <MODE>mode)
+ && !aarch64_float_const_representable_p (operands[1])
+ && !aarch64_float_const_zero_rtx_p (operands[1])
+ && aarch64_float_const_rtx_p (operands[1]))
+ {
+ unsigned HOST_WIDE_INT ival;
+ bool res = aarch64_reinterpret_float_as_int (operands[1], &ival);
+ gcc_assert (res);
+
+ machine_mode intmode
+ = int_mode_for_size (GET_MODE_BITSIZE (<MODE>mode), 0).require ();
+ rtx tmp = gen_reg_rtx (intmode);
+ emit_move_insn (tmp, gen_int_mode (ival, intmode));
+ emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp));
+ DONE;
+ }
}
)
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:HFBF 0 "nonimmediate_operand")
(match_operand:HFBF 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%0.4h, #0
[ w , ?rY ; f_mcr , fp16 ] fmov\t%h0, %w1
@@ -1666,8 +1685,7 @@ (define_insn "*mov<mode>_aarch64"
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:SFD 0 "nonimmediate_operand")
(match_operand:SFD 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%0.2s, #0
[ w , ?rY ; f_mcr , * ] fmov\t%s0, %w1
@@ -1687,8 +1705,7 @@ (define_insn "*mov<mode>_aarch64"
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:DFD 0 "nonimmediate_operand")
(match_operand:DFD 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%d0, #0
[ w , ?rY ; f_mcr , * ] fmov\t%d0, %x1
@@ -1705,27 +1722,6 @@ (define_insn "*mov<mode>_aarch64"
}
)
-(define_split
- [(set (match_operand:GPF_HF 0 "nonimmediate_operand")
- (match_operand:GPF_HF 1 "const_double_operand"))]
- "can_create_pseudo_p ()
- && !aarch64_can_const_movi_rtx_p (operands[1], <MODE>mode)
- && !aarch64_float_const_representable_p (operands[1])
- && !aarch64_float_const_zero_rtx_p (operands[1])
- && aarch64_float_const_rtx_p (operands[1])"
- [(const_int 0)]
- {
- unsigned HOST_WIDE_INT ival;
- if (!aarch64_reinterpret_float_as_int (operands[1], &ival))
- FAIL;
-
- rtx tmp = gen_reg_rtx (<FCVT_TARGET>mode);
- emit_move_insn (tmp, gen_int_mode (ival, <FCVT_TARGET>mode));
- emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp));
- DONE;
- }
-)
-
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:TFD 0
"nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> The idea was that, if we did the split during expand, the movsf/df
>> define_insns would then only accept the immediates that their
>> constraints can handle.
>
> Right, always disallowing these immediates works fine too (it seems
> reload doesn't require all immediates to be valid), and then the split is
> redundant. I've updated the patch:
>
>
> v2: split during expand, remove movsf/df splitter
>
> The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
> combines moves. The movsf/df patterns allow all FP immediates since they rely
> on a split pattern. However splits do not happen during IRA, so the result is
> extra literal loads. To avoid this, split early during expand and block
> creation of FP immediates that need this split.
>
> double f(void) { return 128.0; }
>
> -O2 -fno-schedule-insns gives:
>
> adrp x0, .LC0
> ldr d0, [x0, #:lo12:.LC0]
> ret
>
> After patch:
>
> mov x0, 4638707616191610880
> fmov d0, x0
> ret
>
> Passes bootstrap & regress, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64.md (movhf_aarch64): Use aarch64_valid_fp_move.
> (movsf_aarch64): Likewise.
> (movdf_aarch64): Likewise.
> * config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> * config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
Thanks, this mostly looks good, but...
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 05d3258abf7b43342d9058dd1b365a1a0870cdc2..6da81556110c978a9de6f6fad5775c9d77771b10 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -843,6 +843,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
> opt_machine_mode aarch64_vq_mode (scalar_mode);
> opt_machine_mode aarch64_full_sve_mode (scalar_mode);
> bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
> +bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
> bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 00bcf18ae97cea94227c00798b7951daa255d360..ec2d391d1e3eb9bd28f66fb6ee85311b4ced4c94 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11175,6 +11175,36 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode)
> return aarch64_simd_valid_mov_imm (v_op);
> }
>
> +/* Return TRUE if DST and SRC with mode MODE is a valid fp move. */
> +bool
> +aarch64_valid_fp_move (rtx dst, rtx src, machine_mode mode)
> +{
> + if (!TARGET_FLOAT)
> + return false;
> +
> + if (aarch64_reg_or_fp_zero (src, mode))
> + return true;
> +
> + if (!register_operand (dst, mode))
> + return false;
> +
> + if (MEM_P (src))
> + return true;
> +
> + if (!DECIMAL_FLOAT_MODE_P (mode))
> + {
> + if (aarch64_can_const_movi_rtx_p (src, mode)
> + || aarch64_float_const_representable_p (src)
> + || aarch64_float_const_zero_rtx_p (src))
> + return true;
> +
> + /* Block FP immediates which are split during expand. */
> + if (aarch64_float_const_rtx_p (src))
> + return false;
> + }
> +
> + return can_create_pseudo_p ();
...I still think we should avoid testing can_create_pseudo_p.
Does it work with the last part replaced by:
if (!DECIMAL_FLOAT_MODE_P (mode))
{
if (aarch64_can_const_movi_rtx_p (src, mode)
|| aarch64_float_const_representable_p (src)
|| aarch64_float_const_zero_rtx_p (src))
return true;
}
return false;
? If so, the patch is OK with that change from my POV, but please wait
till Thursday morning for others' opinions.
Richard
> +}
>
> /* Return the fixed registers used for condition codes. */
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 8d10197c9e8dd66b7a30a1034b629297b9992661..b865ae2ff5e23edc4d8990e1efd4a51dd195f41f 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1636,14 +1636,33 @@ (define_expand "mov<mode>"
> && ! (GET_CODE (operands[1]) == CONST_DOUBLE
> && aarch64_float_const_zero_rtx_p (operands[1])))
> operands[1] = force_reg (<MODE>mode, operands[1]);
> +
> + if (!DECIMAL_FLOAT_MODE_P (<MODE>mode)
> + && GET_CODE (operands[1]) == CONST_DOUBLE
> + && can_create_pseudo_p ()
> + && !aarch64_can_const_movi_rtx_p (operands[1], <MODE>mode)
> + && !aarch64_float_const_representable_p (operands[1])
> + && !aarch64_float_const_zero_rtx_p (operands[1])
> + && aarch64_float_const_rtx_p (operands[1]))
> + {
> + unsigned HOST_WIDE_INT ival;
> + bool res = aarch64_reinterpret_float_as_int (operands[1], &ival);
> + gcc_assert (res);
> +
> + machine_mode intmode
> + = int_mode_for_size (GET_MODE_BITSIZE (<MODE>mode), 0).require ();
> + rtx tmp = gen_reg_rtx (intmode);
> + emit_move_insn (tmp, gen_int_mode (ival, intmode));
> + emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp));
> + DONE;
> + }
> }
> )
>
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:HFBF 0 "nonimmediate_operand")
> (match_operand:HFBF 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%0.4h, #0
> [ w , ?rY ; f_mcr , fp16 ] fmov\t%h0, %w1
> @@ -1666,8 +1685,7 @@ (define_insn "*mov<mode>_aarch64"
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:SFD 0 "nonimmediate_operand")
> (match_operand:SFD 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%0.2s, #0
> [ w , ?rY ; f_mcr , * ] fmov\t%s0, %w1
> @@ -1687,8 +1705,7 @@ (define_insn "*mov<mode>_aarch64"
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:DFD 0 "nonimmediate_operand")
> (match_operand:DFD 1 "general_operand"))]
> - "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> - || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> + "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
> {@ [ cons: =0 , 1 ; attrs: type , arch ]
> [ w , Y ; neon_move , simd ] movi\t%d0, #0
> [ w , ?rY ; f_mcr , * ] fmov\t%d0, %x1
> @@ -1705,27 +1722,6 @@ (define_insn "*mov<mode>_aarch64"
> }
> )
>
> -(define_split
> - [(set (match_operand:GPF_HF 0 "nonimmediate_operand")
> - (match_operand:GPF_HF 1 "const_double_operand"))]
> - "can_create_pseudo_p ()
> - && !aarch64_can_const_movi_rtx_p (operands[1], <MODE>mode)
> - && !aarch64_float_const_representable_p (operands[1])
> - && !aarch64_float_const_zero_rtx_p (operands[1])
> - && aarch64_float_const_rtx_p (operands[1])"
> - [(const_int 0)]
> - {
> - unsigned HOST_WIDE_INT ival;
> - if (!aarch64_reinterpret_float_as_int (operands[1], &ival))
> - FAIL;
> -
> - rtx tmp = gen_reg_rtx (<FCVT_TARGET>mode);
> - emit_move_insn (tmp, gen_int_mode (ival, <FCVT_TARGET>mode));
> - emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp));
> - DONE;
> - }
> -)
> -
> (define_insn "*mov<mode>_aarch64"
> [(set (match_operand:TFD 0
> "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
Hi Richard,
> ...I still think we should avoid testing can_create_pseudo_p.
> Does it work with the last part replaced by:
>
> if (!DECIMAL_FLOAT_MODE_P (mode))
> {
> if (aarch64_can_const_movi_rtx_p (src, mode)
> || aarch64_float_const_representable_p (src)
> || aarch64_float_const_zero_rtx_p (src))
> return true;
> }
>
> return false;
>
> ? If so, the patch is OK with that change from my POV, but please wait
> till Thursday morning for others' opinions.
With that every FP literal load causes an internal error, so they must be allowed.
I could change the can_create_pseudo_p to true. This generates identical code but
it allows literal loads to be created after regalloc (which should ultimately crash as
there is no valid alternative).
Cheers,
Wilco
@@ -758,6 +758,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
opt_machine_mode aarch64_vq_mode (scalar_mode);
opt_machine_mode aarch64_full_sve_mode (scalar_mode);
bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
+bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
HOST_WIDE_INT);
@@ -11144,6 +11144,37 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode)
return aarch64_simd_valid_mov_imm (v_op);
}
+/* Return TRUE if DST and SRC with mode MODE is a valid fp move. */
+bool
+aarch64_valid_fp_move (rtx dst, rtx src, machine_mode mode)
+{
+ if (!TARGET_FLOAT)
+ return false;
+
+ if (aarch64_reg_or_fp_zero (src, mode))
+ return true;
+
+ if (!register_operand (dst, mode))
+ return false;
+
+ if (MEM_P (src))
+ return true;
+
+ if (!DECIMAL_FLOAT_MODE_P (mode))
+ {
+ if (aarch64_can_const_movi_rtx_p (src, mode)
+ || aarch64_float_const_representable_p (src)
+ || aarch64_float_const_zero_rtx_p (src))
+ return true;
+
+ /* Block combine_and_move pass from creating FP immediates which
+ require a split during IRA - only allow this before regalloc. */
+ if (aarch64_float_const_rtx_p (src))
+ return can_create_pseudo_p () && !ira_in_progress;
+ }
+
+ return can_create_pseudo_p ();
+}
/* Return the fixed registers used for condition codes. */
@@ -1644,8 +1644,7 @@ (define_expand "mov<mode>"
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:HFBF 0 "nonimmediate_operand")
(match_operand:HFBF 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%0.4h, #0
[ w , ?rY ; f_mcr , fp16 ] fmov\t%h0, %w1
@@ -1668,8 +1667,7 @@ (define_insn "*mov<mode>_aarch64"
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:SFD 0 "nonimmediate_operand")
(match_operand:SFD 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%0.2s, #0
[ w , ?rY ; f_mcr , * ] fmov\t%s0, %w1
@@ -1689,8 +1687,7 @@ (define_insn "*mov<mode>_aarch64"
(define_insn "*mov<mode>_aarch64"
[(set (match_operand:DFD 0 "nonimmediate_operand")
(match_operand:DFD 1 "general_operand"))]
- "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
- || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
+ "aarch64_valid_fp_move (operands[0], operands[1], <MODE>mode)"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , Y ; neon_move , simd ] movi\t%d0, #0
[ w , ?rY ; f_mcr , * ] fmov\t%d0, %x1